Optimizing Power @ Design Time Circuits Jan M. Rabaey Dejan Marković Borivoje Nikolić Low Power Design Essentials ©2008 Chapter 4 Chapter Outline Optimization framework for energy-delay trade-off Dynamic power optimization – Multiple supply voltages – Transistor sizing – Technology mapping Static power optimization – Multiple thresholds – Transistor stacking Low Power Design Essentials ©2008 4.2 Energy/Power Optimization Strategy For given function and activity, an optimal operation point can be derived in the energy-performance space Time of optimization depends upon activity profile Different optimizations apply to active and static power Fixed Activity Variable Activity No Activity - Standby Design time Run time Sleep Active Static Low Power Design Essentials ©2008 4.3 Energy-Delay Optimization and Trade-off Energy/op Trade-off space Unoptimized design Emax Emin Dmin Dmax Delay Maximize throughput for given energy or Minimize energy for given throughput Other important metrics: Area, Reliability, Reusability Low Power Design Essentials ©2008 4.4 The Design Abstraction Stack A very rich set of design parameters to consider! It helps to consider options in relation to their abstraction layer System/Application This Chapter Software Choice of algorithm Amount of concurrency (Micro-)Architecture Parallel versus pipelined, general purpose versus application specific Logic/RT logic family, standard cell versus custom Circuit sizing, supply, thresholds Device Bulk versus SOI Low Power Design Essentials ©2008 4.5 Optimization Can/Must Span Multiple Levels Architecture Micro-Architecture Circuit (Logic & FFs) Design optimization combines top-down and bottom-up: “meet-in-the-middle” Low Power Design Essentials ©2008 4.6 topology A topology B Delay Energy/op Energy/op Energy-Delay Optimization topology A topology B Delay Globally optimal energy-delay curve for a given function Low Power Design Essentials ©2008 4.7 Some Optimization Observations Energy ∂E / ∂A SA= ∂D / ∂A SA A=A0 (A0,B0) SB f (A,B0) f (A0,B) D0 Delay Energy-Delay Sensitivities Low Power Design Essentials ©2008 [Ref: V. Stojanovic, ESSCIRC’02] 4.8 Finding the Optimal Energy-Delay Curve Pareto-optimal: the best that can be achieved without disadvantaging at least one metric. f (A1,B) Energy ∆E = SA∙(∆D) + SB∙∆D (A0,B0) f (A,B0) ∆D D0 f (A0,B) Delay On the optimal curve, all sensitivities must be equal Low Power Design Essentials ©2008 4.9 Reducing Active Energy @ Design Time Eactive ~ a CL Vswing VDD Pactive ~ a CL Vswing VDD f Reducing voltages – Lowering the supply voltage (VDD) at the expense of clock speed – Lowering the logic swing (Vswing) Reducing transistor sizes (CL) – Slows down logic Reducing activity (a) – Reducing switching activity through transformations – Reducing glitching by balancing logic Low Power Design Essentials ©2008 4.10 Observation Downsizing and/or lowering the supply on the critical path lowers the operating frequency Downsizing non-critical paths reduces energy for free, but target delay tp (path) Low Power Design Essentials ©2008 # of paths # of paths – Narrows down the path delay distribution – Increases impact of variations, impacts robustness target delay tp (path) 4.11 Circuit Optimization Framework Energy (VDD, VTH, W) Delay (VDD, VTH, W) ≤ Dcon Constraints VDDmin < VDD < VDDmax VTHmin < VTH < VTHmax Wmin < W Reference case Energy/op minimize subject to topology A topology B – Dmin sizing @ VDDmax, VTHref Low Power Design Essentials ©2008 [Ref: V. Stojanovic, ESSCIRC’02] Delay 4.12 Optimization Framework: Generic Network Ci VDD,i VDD,i+1 i i+1 gCi Cw Ci+1 Gate in stage i loaded by fanout (stage i+1) Low Power Design Essentials ©2008 4.13 Alpha-power based Delay Model K dVDD gC i Cw Ci 1 1 Ci1 t p ( ) nom (1 ) ad gCi g Ci (VDD Von ) Fit parameters: Von, ad, Kd, g 4 60 simulation model simulation model 50 3 Von = 0.37 V a d = 1.53 2.5 2 1.5 Delay (ps) FO4 delay (norm.) 3.5 nom = 6 ps g = 1.35 40 30 20 1 0.5 0 tp 10 (90nm technology) 0.5 0.6 0.7 0.8 0.9 ref VDD / VDD Low Power Design Essentials ©2008 1 0 0 2 4 6 8 10 Fanout (Ci+1/Ci) VDDref = 1.2V, technology 90 nm 4.14 Combined with Logical Effort Formulation For Complex Gates t p nom ( pi fi gi g ) Parasitic delay pi – depends upon gate topology Electrical effort fi ≈ Si+1/Si Logical effort gi – depends upon gate topology Effective fanout hi = figi Low Power Design Essentials ©2008 [Ref: I. Sutherland, Morgan-Kaufman’99] 4.15 Dynamic Energy Edyn (gCi Cw Ci 1 ) VDD ,i Ci (g f i) VDD ,i 2 f i (Cw Ci 1 ) / Ci Si1 / S i Ci K e Si Ci 2 VDD,i VDD,i+1 i i+1 gCi Cw Ei Ke Si (V 2 DD ,i 1 Ci+1 gVDD ,i ) 2 = energy consumed by logic gate i Low Power Design Essentials ©2008 4.16 Optimizating Return on Investment (ROI) Depends on Sensitivity (E/D) Gate Sizing E D Si Si Ei nom (hi hi 1 ) for equal h (Dmin) Supply Voltage E D VDD VDD Von 2 (1 ) E VDD D a 1 Von d VDD Low Power Design Essentials ©2008 max at VDD(max) (Dmin) 4.17 Example: Inverter Chain Properties of inverter chain – Single path topology – Energy increases geometrically from input to output 1 S1 = 1 S2 S3 … SN CL Goal – Find optimal sizing S = [S1, S2, …, SN], supply voltage, and buffering strategy to achieve the best energy-delay tradeoff Low Power Design Essentials ©2008 4.18 Inverter Chain: Gate Sizing effective fanout, h 25 nom opt 20 d inc = 50% 30% 15 10% 10 1% 5 0% 0 1 2 3 4 5 stage 6 7 S i 1 S i 1 Si 1 S i 1 2 2 K e VDD nom FS Ei FS hi hi 1 2 [Ref: Ma, JSSC’94] Variable taper achieves minimum energy Reduce number of stages at large dinc Low Power Design Essentials ©2008 4.19 Inverter Chain: VDD Optimization 0% V DD / V DD nom 1.0 1% 0.8 10% 0.6 30% 0.4 d = 50% nom opt 0.2 0 inc 1 2 3 4 5 stage 6 7 VDD reduces energy of the final load first Variable taper achieved by voltage scaling Low Power Design Essentials ©2008 4.20 Inverter Chain: Optimization Results 100 0.8 energy reduction (%) Sensitivity (norm) 1.0 S gVDD 2VDD cVDD 0.6 0.4 0.2 0 0 10 20 30 dinc (%) 40 50 80 60 40 20 0 0 10 20 30 dinc (%) 40 50 Parameter with the largest sensitivity has the largest potential for energy reduction Two discrete supplies mimic per-stage VDD Low Power Design Essentials ©2008 4.21 Example: Kogge-Stone Tree Adder (A15, B15) S15 Tree adder – Long wires – Re-convergent paths – Multiple active outputs (A0, B0) S0 Cin Low Power Design Essentials ©2008 [Ref: P. Kogge, Trans. Comp’73] 4.22 Tree Adder: Sizing vs. Dual-VDD Optimization Reference design: all paths are critical reference D=Dmin sizing: E (-54%) dinc=10% 2Vdd: E (-27%) dinc=10% Internal energy S more effective than VDD – S: E(-54%), 2Vdd: E(-27%) at dinc = 10% Low Power Design Essentials ©2008 4.23 Tree Adder: Multi-dimensional Search 1 Reference VDD, VTH 0.8 S, VDD S, VTH 0.6 S, VDD, VTH 0.4 0.2 0 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay / Dmin Can get pretty close to optimum with only 2 variables Getting the minimum speed or delay is very expensive Low Power Design Essentials ©2008 4.24 Multiple Supply Voltages Block-level supply assignment – Higher throughput/lower latency functions are implemented in higher VDD – Slower functions are implemented with lower VDD – This leads to so-called “voltage islands” with separate supply grids – Level conversion performed at block boundaries Multiple supplies inside a block – Non-critical paths moved to lower supply voltage – Level conversion within the block – Physical design challenging Low Power Design Essentials ©2008 4.25 Using Three VDD’s © IEEE 2002 1 1.3 1.21.2 0.8 1.1 0.7 11 0.6 0.5 V2 (V) 0.9 V3 (V) Power Reduction Ratio 1.41.4 0.9 0.80.8 0.7 0.4 1.5 + 0.60.6 0.5 1 0.5 0 0 0.5 1 1.5 0.40.4 0.4 0.4 0.5 0.6 0.6 0.7 0.8 0.8 0.9 1 1 V1 (V) 1.1 1.2 1.2 1.3 1.4 1.4 V2 (V) V1 = 1.5V, VTH = 0.3V Low Power Design Essentials ©2008 [Ref: T. Kuroda, ICCAD’02] 4.26 Optimum Number of VDD’s { V1, V2, V3 } { V1, V2 } VDD Ratio 1.0 { V1, V2, V3, V4 } V2/V1 V2/V1 V2/V1 V3/V1 V3/V1 0.5 V4/V1 1.0 P Ratio P2/P1 P3/P1 P4/P1 0.4 © IEEE 2001 0.5 1.0 V1 1.5 (V) 0.5 1.5 0.5 1.0 V1 (V) 1.0 V1 1.5 (V) The more VDD’s the less power, but the effect saturates Power reduction effect decreases with scaling of VDD Optimum V2/V1 is around 0.7 Low Power Design Essentials ©2008 [Ref: M. Hamada, CICC’01] 4.27 Lessons: Multiple Supply Voltages Two supply voltages per block are optimal Optimal ratio between the supply voltages is 0.7 Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF) An option is to use an asynchronous level converter – More sensitive to coupling and supply noise Low Power Design Essentials ©2008 4.28 Distributing Multiple Supply Voltages Conventional VDDH i1 Shared N-well VDDH VDDL VDDL o1 i1 i2 o2 VSS VDDH circuit Low Power Design Essentials ©2008 o1 i2 o2 VSS VDDL circuit VDDH circuit VDDL circuit 4.29 Conventional VDDL Row N-well isolation VDDH VDDL VDDH Row VDDL Row VDDH Row (a) Dedicated row VSS VDDH circuit VDDL circuit VDDH Region VDDL Region (b) Dedicated region Low Power Design Essentials ©2008 4.30 Shared N-Well Shared N-well VDDL circuit VDDH circuit VDDH VDDL VSS VDDH circuit VDDL circuit [Shimazaki et al, ISSCC’03] Low Power Design Essentials ©2008 (a) Floor plan image 4.31 Example: Multiple Supplies in a Block Conventional Design CVS Structure FF Level-Shifting F/F FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF Critical Path FF © IEEE 1998 FF Critical Path Lower VDD portion is shared “Clustered voltage scaling” Low Power Design Essentials ©2008 [Ref: M. Takahashi, ISSCC’98] 4.32 Level Converting Flip-Flops (LCFFs) level conversion ck ckb level conversion sf mo db so q d ck sf so d q (inv.) ck mf MN1 MN2 ckb ck ck clk clk Master-Slave Pulsed Half-Latch © IEEE 2003 Pulsed Half-Latch versus Master-Slave LCFFs Smaller # of MOSFETs / clock loading Faster level conversion using half-latch structure Shorter D-Q path from pulsed circuit Low Power Design Essentials ©2008 [Ref: F. Ishihara, ISLPED’03] 4.33 Dynamic Realization of Pulsed LCFF VDDH xb Pulsed precharge LCFF (PPR) – Fast level conversion by precharge mechanism – Suppressed charge/discharge toggle by conditional capture – Short D-Q path clk MN1 ckd1 MN2 VDDH VDDH IV1 x MP1 q (inv.) qb clk ckd1 d level conversion db ck qb Pulsed Precharge Latch © IEEE 2003 Low Power Design Essentials ©2008 [Ref: F. Ishihara, ISLPED’03] 4.34 Case Study: ALU for 64-bit Processor clock gen. clk ain0 ain 9:1 MUX 5:1 MUX 9:1 MUX 2:1 MUX carry gp gen. INV2 bin : VDDH circuit : VDDL circuit carry gen. partial sum sum sum sel. INV1 s0/s1 0.5pF logical unit sumb (long loop-back bus) © IEEE 2003 Low Power Design Essentials ©2008 [Ref: Y. Shimazaki, ISSCC’03] 4.35 Low-Swing Bus and Level Converter VDDH pc VDDL VDDL sumb sum INV1 keeper sel (VDDH) VDDH ain0 INV2 domino level converter (9:1 MUX) © IEEE 2003 INV2 is placed near 9:1 MUX to increase noise immunity Level conversion is done by a domino 9:1 MUX Low Power Design Essentials ©2008 [Ref: Y. Shimazaki, ISSCC’03] 4.36 Measured Results: Energy and Delay Energy [pJ] 800 Room temperature © IEEE 2003 700 600 500 400 300 200 0.6 1.16GHz VDDL=1.4V Energy:-25.3% Delay :+2.8% VDDL=1.2V Energy:-33.3% Delay :+8.3% 0.8 Low Power Design Essentials ©2008 1.0 1.2 TCYCLE [ns] 1.4 Single-supply Shared well (VDDH=1.8V) 1.6 [Ref: Y. Shimazaki, ISSCC’03] 4.37 Practical Transistor Sizing Continuous sizing of transistors only an option in custom design In ASIC design flows, options set by available library Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell – Leads to larger libraries (> 800 cells) – Easily integrated into technology mapping Low Power Design Essentials ©2008 4.38 Technology Mapping a b f c d slack=1 Larger gates reduce capacitance, but are slower Low Power Design Essentials ©2008 4.39 Technology Mapping Example: 4-input AND (a) Implemented using 4 input NAND + INV (b) Implemented using 2 input NAND + 2-input NOR Gate type Library 1: High-Speed Library 2: Low-Power Area (cell unit) Input cap. (fF) Average delay (ps) Average delay (ps) INV 3 1.8 7.0 + 3.8 CL 12.0 + 6.0 CL NAND2 4 2.0 10.3 + 5.3 CL 16.3 + 8.8 CL NAND4 5 2.0 13.6 + 5.8 CL 22.7 + 10.2 CL NOR2 3 2.2 10.7 + 5.4 CL 16.7 + 8.9 CL (delay formula: CL in fF) Low Power Design Essentials ©2008 (numbers calibrated for 90 nm) 4.40 Technology Mapping – Example 4-input AND (a) NAND4 + INV (b) NAND2 + NOR2 Area 8 11 HS: Delay (ps) 31.0 + 3.8 CL 32.7 + 5.4 CL LP: Delay (ps) 53.1 + 6.0 CL 52.4 + 8.9 CL Sw Energy (fF) 0.1 + 0.06 CL 0.83 + 0.06 CL Area – 4-input more compact than 2-input (2 gates vs. 3 gates) Timing – both implementations are 2-stage realizations – 2nd stage INV (a) is better driver than NOR2 (b) – For more complex blocks, simpler gates will show better performance Energy – Internal switching increases energy in the 2-input case – Low-power library has worse delay, but lower leakage (see later) Low Power Design Essentials ©2008 4.41 Gate-Level Tradeoffs for Power Technology mapping Gate selection Sizing Pin assignment Logical Optimizations Factoring Restructuring Buffer insertion/deletion Don’t care optimization Low Power Design Essentials ©2008 4.42 Logic Restructuring 1 1 1 0 0 1 0 1 1 Logic restructuring to minimize spurious transitions 1 1 1 1 2 1 1 1 1 1 3 Buffer insertion for path balancing Low Power Design Essentials ©2008 4.43 Algebraic Transformations Idea: Modify network to reduce capacitance p1=0.05 a b a c p3=0.075 f p5=0.075 a f b c p2=0.05 p4=0.75 pa = 0.1; pb = 0.5; pc = 0.5 Caveat: This may increase activity! Low Power Design Essentials ©2008 4.44 Lessons from Circuit Optimization Joint optimization over multiple design parameters possible using sensitivity-based optimization framework – Equal marginal costs ⇔ Energy-efficient design Peak performance is VERY power inefficient – About 70% energy reduction for 20% delay penalty – Additional variables for higher energy-efficiency Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage Choice between sizing and supply voltage parameters depends upon circuit topology But … leakage not considered so far Low Power Design Essentials ©2008 4.45 Considering Leakage @ Design Time Considering leakage as well as dynamic power is essential in sub-100 nm technologies Leakage is not essentially a bad thing – Increased leakage leads to improved performance, allowing for lower supply voltages – Again a trade-off issue … Low Power Design Essentials ©2008 4.46 Leakage – Not Necessarily a Bad Thing 1 Version 1 Vref -180mV th 0.8 ELk max E norm 0.81VDD ESw opt 0.6 Version 2 0.4 Topology Ld ln a avg K Inv Add Dec (ELk/ESw)opt 0.8 Vref -140mV th 0.2 2 0.5 0.2 max 0.52VDD © IEEE 2004 0 -2 10 -1 0 10 10 Estatic /Edynamic 1 10 Optimal designs have high leakage (ELk/ESw ≈ 0.5) Must adapt to process and activity variations Low Power Design Essentials ©2008 [Ref: D. Markovic, JSSC’04] 4.47 Refining the Optimization Model Switching energy Edyn a 01Ke S (g f )VDD 2 Leakage energy Estat SI 0 (Y )e VTH d VDD kT / q VDD Tcycle with: I0(Y): normalized leakage current with inputs in state Y Low Power Design Essentials ©2008 4.48 Reducing Leakage @ Design Time Using longer transistors – Limited benefit – Increase in active current Using higher thresholds – Channel doping – Stacked devices – Body biasing Reducing the voltage!! Low Power Design Essentials ©2008 4.49 Longer Channels 1.0 10 90 nm CMOS 0.8 9 8 Leakage power 0.7 7 0.6 6 0.5 5 0.4 4 Switching energy 0.3 3 0.2 2 0.1 100 110 120 130 140 150 160 170 180 190 Normalized switching energy Normalized leakage power 0.9 10% longer gates reduce leakage by 50% Increases switching power by 18% with W/L = const. 1 200 Transistor length (nm) Doubling L reduces leakage by 5x Impacts performance – Attractive when don’t have to increase W (e.g. memory) Low Power Design Essentials ©2008 4.50 Using Multiple Thresholds There is no need for level conversion Dual thresholds can be added to standard design flows – High-VTh and Low-VTh libraries are a standard in sub-0.18m processes – For example: can synthesize using only high-VTh and then only in-place swap in low-VTh cells to improve timing. – Second VTh insertion can be combined with resizing Only two thresholds are needed per block – Using more than two yields small improvements Low Power Design Essentials ©2008 4.51 Three VTH’s 1.41.4 1.3 1 1.21.2 1.1 0.6 11 0.4 0.2 Vth1 (V) 0.8 VTH.2 (V) Leakage Reduction Ratio © IEEE 2002 0.9 0.80.8 0.7 0 1.5 0.60.6 0.5 1 1 0.5 0 0 1.5 0.5 0.40.4 + 0.4 0.4 0.5 0.6 0.6 0.7 0.8 0.8 0.9 1 1 1.1 Vth2 (V) 1.2 1.2 1.3 1.4 1.4 VTH.3 (V) VDD = 1.5V, VTH.1 = 0.3V Impact of third threshold very limited Low Power Design Essentials ©2008 [Ref: T. Kuroda, ICCAD’02] 4.52 Using Multiple Thresholds Cell-by-cell VTH assignment (not at block level) Achieves all-low-VTH performance with substantial leakage reduction in leakage FF FF FF FF FF High VTH Low Power Design Essentials ©2008 Low VTH [Ref: S. Date, SLPE’94] 4.53 Dual-VT Domino Low-threshold transistors used only in critical paths Inv3 Inv2 Clkn+1 Clkn P1 Dn+1 Dn … Inv1 Shaded transistors are low threshold Low Power Design Essentials ©2008 4.54 Multiple Thresholds and Design Methodology Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds – Selection of cells during technology mapping – No impact on dynamic power – No interface issues (as was the case with multiple VDD’s) Impact: Can reduce leakage power substantially Low Power Design Essentials ©2008 4.55 Dual-VTH Design for High-Performance Design High-VTH Only Low-VTH Only Dual VTH Total Slack -53 psec 0 psec 0 psec Dynamic Power 3.2 mW 3.3 mW 3.2 mW Static Power 914 nW 3873 nW 1519 nW All designs synthesized automatically using Synopsys Flows Low Power Design Essentials ©2008 [Courtesy: Synopsys, Toshiba, 2004] 4.56 Example: High- vs. Low-Threshold Libraries Leakage Power (nW) 8000 Selected combinational tests 130 nm CMOS 7000 6000 5000 LVth LVth+HVth HVth HVth+LVth 4000 3000 2000 1000 0 i10 Low Power Design Essentials ©2008 des C7552 seq pair AVER [Courtesy: Synopsys 2004] 4.57 Complex Gates Increase Ion/Ioff Ratio 140 3 (90nm technology) (90nm technology) 120 2.5 100 Ioff (nA) Ion (A) No stack 2 1.5 80 60 No stack 1 40 Stack 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 VDD (V) Stack 20 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 VDD (V) Ion and Ioff of single NMOS versus stack of 10 NMOS transistors Transistors in stack are sized up to give similar drive Low Power Design Essentials ©2008 4.58 Complex Gates Increase Ion/Ioff Ratio 3.5 x 105 (90nm technology) 3 Ion/Ioff ratio 2.5 Stack 2 Factor 10! 1.5 1 No stack 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 VDD (V) Stacking transistors suppresses submicron effects Reduced velocity saturation Reduced DIBL effect Allows for operation at lower thresholds Low Power Design Essentials ©2008 4.59 Complex Gates Increase Ion/Ioff Ratio Example: 4-input NAND versus Fan-in (4) Fan-in (2) With transistors sized for similar performance: Leakage of Fan-in(2) = Leakage of Fan-in(4) x 3 (Averaged over all possible input patterns) Leakage Current (nA) 14 12 10 8 Fan-in (2) 6 4 2 0 Fan-in (4) 2 4 6 8 10 12 14 16 Input pattern Low Power Design Essentials ©2008 4.60 Example: 32 bit Kogge-Stone Adder factor 18 % of input vectors © Springer 2001 Standby leakage current (A) Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60 Low Power Design Essentials ©2008 [Ref: S.Narendra, ISLPED’01] 4.61 Summary Circuit optimization can lead to substantial energy reduction at limited performance loss Energy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s. Well-defined optimization problem over W, VDD and VTH parameters Increasingly better support by today’s CAD flows Observe: leakage is not necessarily bad – if appropriately managed. Low Power Design Essentials ©2008 4.62 References Books: A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer Academic Publishers, 1st Ed, 1995. D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002. D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007. J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed, Prentice Hall 2003. I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, MorganKaufmann, 1st Ed, 1999. Articles: R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002. S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994. M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001. F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003. P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973. T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002. Low Power Design Essentials ©2008 4.63 References Articles (cont.): H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J. Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975. S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994. D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True EnergyPerformance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004. MathWorks, http://www.mathworks.com S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195-200, Aug. 2001. T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990. Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003. V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European SolidState Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002. M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37, Feb. 1998. Low Power Design Essentials ©2008 4.64