ECE260B – CSE241A Winter 2005 Power Distribution Website: http://vlsicad.ucsd.edu/courses/ece260b-w05 ECE 260B – CSE 241A Power Distribution 1 http://vlsicad.ucsd.edu Motivation Power supply noise is a serious issue in DSM design Noise is getting worse as technology scales Noise margin decreases as supply voltage scales Power supply noise may slow down circuit performance Power supply noise may cause logic failures ECE 260B – CSE 241A Power Distribution 2 http://vlsicad.ucsd.edu Power = … Routing resources Vcc 20-40% of all metal tracks used by Vcc, Vss Increased power denser power grid Vss Vcc Pins Vss Vcc or Vss pin carries 0.5-1W of power Pentium 4 uses 423 pins; 223 Vcc or Vss More pins package more expensive (+ package development, motherboard redesign, …) Vcc Battery cost 1kg NiCad battery powers a Pentium 4 alone for less than 1 hour Performance High chip temperatures degrade circuit performance Large across-chip temperature variations induce clock skew High chip power limits use of high-performance circuits Power transients determine minimum power supply voltage ECE 260B – CSE 241A Power Distribution 3 http://vlsicad.ucsd.edu Power = Package Pentium 4 die is about 1.5g and less than 1cm^3 Pentium-4 in package with interposer, heat sink, and fan can be 500g and 150cm^3 Fan Heat Sink Processor Processor Pins Integrated Heat Spreader Decoupling Capacitors OLGA Pins Package Pins Interposer Modern processor packaging is complex and adds significantly to product cost. http://www.intel.com/support/processors/procid/ptype.htm ECE 260B – CSE 241A Power Distribution 4 Courtesy M. McDermott UT-Austin http://vlsicad.ucsd.edu Planning for Power Early simulation of major power dissipation components Early quantification of chip power - Total chip power - Maximum power density - Total chip power fluctuations – inherent & added fluctuations due to clock gating Early power distribution analysis (dc, ac, & multi-cycle) I.e., average, maximum, multi-cycle fluctuations Early allocation & coordination of chip resources - Wiring tracks for power grid Low Vt devices Dynamic circuits Clock gating Placement and quantity of added decoupling capacitors ECE 260B – CSE 241A Power Distribution 5 http://vlsicad.ucsd.edu Power and Ground Routing Floorplanning includes planning how the power, ground and clock should route Power supply distribution Tree: trunk must supply current to all branches Resistance must be very small since when a gate switches, its current flows through the supply lines - If the resistance of supply lines is too large, voltage supplied to gates will drop, which can cause the gate to malfunction - Usually, want at most 5-10% IR drop due to supply resistance Usually on the top layers of metal, then distributed to lower wiring layers ECE 260B – CSE 241A Power Distribution 6 http://vlsicad.ucsd.edu Planar Power Distribution Topology of VDD/VSS networks. Inter-digitated Design each macrocell such that all VDD and VSS terminals are on opposite sides. If floorplan places all macrocells with VDD on same side, then no crossing between VDD and VSS. VSS VDD cell VSS VDD cut line VDD VSS B VDD VDD VSS no cut line C cut line VDD VSS A VDD VSS VDD VSS no connection ECE 260B – CSE 241A Power Distribution 7 Courtesy K. Yang, UCLA VSS http://vlsicad.ucsd.edu Gridded Power Distribution With more metal layers, power is striped Connection between the stripes allows a power grid - Minimizes series resistance Connection of lower layer layout/cells to the grid is through vias - Note that planar supply routing is often still needed for a strong lower layer connection. - There may not be sufficient area to make a strong connection in the middle of a design (connect better at periphery of die) ECE 260B – CSE 241A Power Distribution 8 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Power Supply Drop/Noise Supply noise = variations in power supply voltage that act as noise source for logic gates Solution approach Power supply wiring resistance voltage variations with current surges Current surges depend on dynamic behavior of circuit Measure maximum current required by each block Redesign power/ground network to reduce resistance Worst case: move activity to another clock cycle to reduce peak current scheduling problem Example: Drive 32-bit bus, total bus wire load = 2pF, with delay 0.5ns R for each transistor needs to be < 0.25kW to meet RC = 0.5ns Effective R of bits together is 250/32 = 7.5W For < 10% drop, power distribution R must be < 1W ECE 260B – CSE 241A Power Distribution 9 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Electromigration Physical migration of metal atoms due to “electron wind” can eventually create a break in a wire MTTF (mean time to failure) 1/J2 where J= current density Current density must not exceed specification wire Ii/wi < Jspec Specified as mA per m wire width (e.g., 1mA/ m) or mA per via cut EM occurs both in signal (AC=bidirectional) and power wires (DC = unidirectional) Much worse for DC than AC; DC occurs inside cells and in power buses May need more contacts on transistor sources and drains to meet EM limits Width of power buses must support both iR and EM requirements Issues in IR and EM constraint generation Topology is most likely not a tree How do we determine current patterns? Effects of R, L ECE 260B – CSE 241A Power Distribution 10 http://vlsicad.ucsd.edu What Happens? Example of an AlCu line seen under microscope. Accelerated by higher temperature and high currents Voids form on grain boundaries Metal atoms move with current away from voids and collect at boundaries Catastrophic failure ECE 260B – CSE 241A Power Distribution 11 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Taken from http://www.nd.edu/~micro/fig20.html Taken from Sverre Sjøthun, “Electromigration In-Depth,” from www.dpwg.com ECE 260B – CSE 241A Power Distribution 12 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Power Supply Rules of Thumb Rules depend on technology Tech file has rules for resistance and electromigration Examples: Must have a contact for each 16l of transistor width (more is better) Wire must have less than 1mA/m of width Power/Gnd width = Length of wire * Sum (all transistors connected to wire) / 3*106l (very approximate) For small designs, power supply design is non-issue ECE 260B – CSE 241A Power Distribution 13 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Basic Methodology Concepts Reliability (slotting, splitting) Alignment of hierarchical rings, stripes Isolation of analog power Styles of power distribution Rings and trunks Uniform grid Bottom-up grid generation Depends on: - Package: flip-chip vs. wire-bond; I/O count (fewer pads denser grid) Power budget IR drop limits Floorplan constraints (hard macros, etc.) ECE 260B – CSE 241A Power Distribution 14 http://vlsicad.ucsd.edu Metal Slotting vs. Splitting Required by metal layout Easy connections through standard via arrays rules for uniform CMP (planarization) GND Split power wires GND Less data than traditional slotting More accurate R/C analysis of power mesh Not supported by all tools GND GND M1 M1 Difficult to connect where should vias go? ECE 260B – CSE 241A Power Distribution 15 Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Trunks and Rings Methodology Each Block has its own ring Rings may be inside the blocks or part of the top level Each Block has trunks connecting top level to block V block 3 V Rings may be shared with abutted blocks G V V G G G block 5 Individual trunks connecting blocks to top level block 2 V block 4 V G G ECE 260B – CSE 241A Power Distribution 16 V G V G V block 1 G Courtesy Cadence Design Systems, Inc. V http://vlsicad.ucsd.edu Trunks and Rings Advantages Disadvantages Power tailored to the demands of each block (flexible) More area efficient since the demands of each block are uniquely met Limited redundancy, power grid built to match needs Simple implementation supported by many tools Non regular structure requires more detailed IR drop/EM analysis Rings can be shared between blocks by abutted blocks missing vias/connections fatal Rings will require slotting/splitting due to wide widths ECE 260B – CSE 241A Power Distribution 17 Assumptions in design may change or be invalid Increase in data volume Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Uniform Chip Grid Methodology Robust and redundant power network global grid higher layers Implementation G Typically pushed into blocks V Blocks typically abut G block 4 ECE 260B – CSE 241A Power Distribution 18 block 1 G V - Global buffer insertion G Courtesy Cadence Design Systems, Inc. block 4 block 5 - Requires block grids to align Rows/Followpins should align with block pins V block 3 G V G - Lower layers in blocks to connect to top through via stacks V Primary distribution through upper metal layers Fine or custom grid or no grid on lower layers G mainly in microprocessors and high end large ASICs V V V G V G V http://vlsicad.ucsd.edu Uniform Chip Grid Advantages Disadvantages Easily implemented Path redundancy allows less sensitively to changes in current pattern Takes up significant routing Lends itself to straightforward hand calculations Mesh of power/ground provides shielding (for capacitance) and current returns (for inductance) Top-down propagation easy to use on this style ECE 260B – CSE 241A Power Distribution 19 resources (20%-40% of all routing tracks if not already reserved for power/ground) Fine grids may slow down P&R tools Imposes grid structure into each block which may be unnecessary Top and blocks coupled closely if top level routing pushed into blocks Changes to block/top must be reflected in other Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Bottom-Up Grid Generation Methodology Design and optimize power grid for block, merge at top Advantages • Able to tailor grid for routing resource efficiency in each block • Flexibility to choose the best grid for the block (i.e. ring and stripe, power plane, grid) Disadvantages • Designing grid in context of the “big picture” is more difficult • Block grid may present challenging connections to top level • Assumptions for block grid’s connection to top level must be analyzed and validated ECE 260B – CSE 241A Power Distribution 20 Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Power Routing in Area-Based P&R Power routing approaches (1) Pre-route parts of power grid during floorplanning (2) Build grid (except connections to standard cells) before P&R (3) Build entire grid before P&R N.B.: Area-based P&R tools respect pre-routes absolutely Cadence tools: power routing done inside SE, all other tasks (clock, place, route, scan, …) done by point tools Lab 5 tomorrow has a tiny bit of power routing (rings, stripes) Miscellany ECOs: What happens to rings and trunks if blocks change size? Layer choices: What is cost of skipping layers (to get from thick top-layer metal down to finer layers)? How wide should power wires be? Post-processing strategies ECE 260B – CSE 241A Power Distribution 21 Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Power Routing Wire Width Considerations Slotting rules: Choose maximum width below slotting width Choose power routing widths carefully to avoid blocking extra tracks (and, use the space if blocking the track!) Halation (width-dependent spacing) rules: Do as much as possible of power routing below wide wire width to save routing space What is better power width here? ECE 260B – CSE 241A Power Distribution 22 Courtesy Cadence Design Systems, Inc. Blocked tracks http://vlsicad.ucsd.edu Power Routing Tool Usage 4 layer power grid example (HVHV) Turn on via stacking Route metal2 vertically Route metal4 vertically (use same coordinates) Route metal3 horizontally (make coincident with every N metal1 routes) Turn off via stacking Route metal1 horizontally metal2/metal4 coincident metal1 inside cells metal3 every n micron ECE 260B – CSE 241A Power Distribution 23 Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu Post-Processing Flows (DEF or Layout Editing) During PnR ECE 260B – CSE 241A Power Distribution 24 After post processing Courtesy Cadence Design Systems, Inc. http://vlsicad.ucsd.edu (Tree) Supply Network Design Tree topology assumption not very useful in practice, but illustrates some basic ideas Assume R dominates, L and C negligible marginally permissible assumption Current drawn at various points in the tree (time-varying waveform) Current causes a V=IR drop Supply “Ground” is not at 0V “Vdd” is not at intended level ECE 260B – CSE 241A Power Distribution 25 Courtesy S. Sapatnekar, UMinn = sinks http://vlsicad.ucsd.edu IR Drop Constraints Chowdhury and Breuer, TCAD 7/88 Can write V drop to each sink as Supply Ri Ii < Vspec for all sink current patterns made available Tree structure: can compute Ii easily Ri li / wi Change wi to reduce IR drop Objective: minimize ai wi Current density must never exceed a specification For each wire, Ii/wi < Jspec ECE 260B – CSE 241A Power Distribution 26 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu P/G Mesh Optimization (R only) Dutta and Marek-Sadowska, DAC 89 Constraints Cost function: ai li wi = ai cili2 // = total wire area (since ci = conductance = wi/( li) - EM: Ii e wi // current density I/w less than upper bound – Substitute Ii = vi (wi/ li) // I = V/R vp - vq e li // divide by wi, * li - Wire width constraints: Wmin wi Wmax (translate to ci) - Voltage drop constraints: va - vb Vspec1 and/or vi Vspec2 - Circuit equations that determine the v’s Variables: ci’s ECE 260B – CSE 241A Power Distribution 27 (vi’s depend on ci’s) Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Solution Technique Method of feasible directions Find an initial feasible solution (satisfies all constraints) Choose a direction that maintains feasibility Make a move in that direction to reduce cost function Given a set of ci’s, must find corresponding vi’s Feasible direction method: move from point c* to c+ c* and c+ must be close to each other (i.e., if you have the solution at c*, the solution at c+ corresponds to a minor change in conductances) Solving for vi’s : solving a system of linear equations - Solution at c* is a good guess for the solution at c+ - Converges in a few iterations ECE 260B – CSE 241A Power Distribution 28 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Modeling Gate Currents Currents in supply grid caused by charging/discharging of capacitances by logic gates All analyses require generation of a “worst-case switching” scenario Enumeration is infeasible Two basic approaches Simulation based methods: designer supplies “hot” vectors, or we try to generate these hot vectors automatically “Pattern-independent” methods: try to estimate the worst-case (can be expensive, very inaccurate) Once current patterns are available, apply them to supply network to find out if constraints are satisfied ECE 260B – CSE 241A Power Distribution 29 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Complexity of Hot Vector Generation Devadas et al., TCAD 3/92: Assume zero gate delays for simplicity Find the maximum current drawn by a block of gates Using a current model for each gate - Find a set of input patterns so that the total current is maximized - Boolean assignment problem: equivalent to Weighted MaxSatisfiability – Given a Boolean formula in conjunctive normal form (product of sums), is there an assignment of truth values to the variables such that the formula evaluates to True? - Checking for Satisfiability (for k-sat, k > 2) is NP-complete Difficult even under zero gate delay assumption ECE 260B – CSE 241A Power Distribution 30 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Pattern-Independent Methods iMAX approach: Kriplani et al., TCAD 8/95 Current model for a single gate Ipeak Delay Gates switch at different times Total current drawn from Vdd (ignoring supply network C) is the sum of these time-shifted waveforms Objective: find the worst-case waveform ECE 260B – CSE 241A Power Distribution 31 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Example (Not to scale!) Maximum current not just a sum of individual maximum currents Temporal dependencies [Using deliberate clock skews can reduce the peak current, as we saw in the Useful-Skew discussion] ECE 260B – CSE 241A Power Distribution 32 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu Maximum Envelope Current (MEC) Find the time interval during which a gate may switch Manufacturing process variations can cause changes Actual switching event can cause changes (unit gate delays) Switching at second gate can occur at t=1 or at t=2 In general, a large number of paths can go through a gate; assume (conservatively) that switching occurs in t [1,2] Assume that all gate inputs can switch independently – provides an upper bound on the switching current ECE 260B – CSE 241A Power Distribution 33 Courtesy S. Sapatnekar, UMinn http://vlsicad.ucsd.edu (Large) Errors in MEC Approach Correlation Problem G1 Switching at G0, G1, G2 and G3 not independent G0 = 0 implies that G1, G2, G3 switch; G0 = 1 means that other inputs will determine gate activity If the other inputs cannot make the gate switch in the same time window, then iMAX estimates are pessimistic G0 G3 Reconvergent Fanout Problem Signals that diverge at G0 reconverge at Gk inputs to Gk are not independent Assumption of independent switching is not valid Many heuristic refinements proposed, but guardbanding (error) of power estimation still a huge issue ECE 260B – CSE 241A Power Distribution 34 Courtesy S. Sapatnekar, UMinn G2 G1 G0 G2 Gk G3 http://vlsicad.ucsd.edu Outline Motivation Power Supply Noise Estimation Decoupling Capacitance (decap) Budget Allocation of Decoupling Capacitance Experiment Results Conclusion ECE 260B – CSE 241A Power Distribution 35 http://vlsicad.ucsd.edu Why Decoupling Capacitance Frequency point of view Decaps form low-pass filters They cancel anti- effects Physical point of view Decaps serve as charge reservoirs They shortcut supply current paths and reduces voltage drop No effect to DC supply currents ECE 260B – CSE 241A Power Distribution 36 http://vlsicad.ucsd.edu Power Supply Network—RLC Mesh VDD Rp :Current Source Lp : VDD pin VDD VDD VDD Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 37 http://vlsicad.ucsd.edu Current Distribution in Power Supply Mesh Illustration :Connection Current contribution point, Current flowing path VDD (1) (3) :VDD pin (5) VDD (2) (6) Module A B Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 38 C http://vlsicad.ucsd.edu Current Distribution in Power Supply Network Distribute switching current for each module in the power supply mesh Observation: Currents tend to flow along the leastimpedance paths Approximation: Consider only those paths with impedance --shortest, second shortest, … minimal I1 I 2 I n I Z1 I 1 Z 2 I 2 Z n I n Ij Yj n Yi i 1 I, j 1,2, n Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 39 http://vlsicad.ucsd.edu Current Flowing Paths and Power Supply Noise Calculation Power supply noise at a target module is the voltage difference between the VDD pin and the module Apply KVL: i3(t) VDD R2 L2 R1 L1 C1 i1(t) C2 k i2(t) V (k ) noise (i j RP LP Pj T ( k ) Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 40 jk jk di j dt http://vlsicad.ucsd.edu ) Why Decoupling Capacitance? i3(t) VDD R2 L2 R1 L1 C1 i1(t) C2 P/G network wiresizing won’t change voltage drop frequency spectrum To reduce Vdrop by k times needs to size up wires by k times along the supply current path ECE 260B – CSE 241A Power Distribution 41 k i2(t) Decoupling caps act as a low-pass filter Efficient to remove high frequency elements of Vdrop http://vlsicad.ucsd.edu Decoupling Capacitance Budget Decap budget for each module can be determined based on its noise level Initial budget can be estimated as follows: Ch arg e : Q (k ) I ( k ) (t )dt 0 (k ) Noise ratio : Decap : noise max(1, V (lim) ) V noise 1 (lim) C ( k ) (1 )Q ( k ) /V noise, k 1,2, M Iterations are performed if necessary until noise at each module in the floorplan is kept under certain limit Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 42 http://vlsicad.ucsd.edu Allocation of Decoupling Capacitance Decap needs to be placed in the vicinity of each target module Decap requires WS to manufacture on Use MOS capacitors Decap allocation is reduced to WS allocation Two-phase approach: Allocate the existing WS in the floorplan Insert additional WS into the floorplan if required Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 43 http://vlsicad.ucsd.edu Allocation of Existing White Space B w2 A D WS C w1 E w3 Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 44 http://vlsicad.ucsd.edu Allocation of Existing WS--Linear Programming (LP) Approach Objective: Maximize the utilization of available WS Existing WS can be allocated to neighboring modules using LP LP Approach: H k 1 jN k Notation: S: Sk : ( j) S : sum of allocated area of WS k decap S xk( j ) , max imize budget s.t. WS of xk( j ) : ws allocated to mod j N k : neighbors set of WS k ( j) x k Sk , jN k k H mod j from WS k ( j) ( j) x S , k k 1 xk( j ) 0, Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 45 k 1, 2 ,, H j 1, 2 ,, M j, k http://vlsicad.ucsd.edu Insert Additional WS into Floorplan If Necessary Update decap budget for each module after existing WS has been allocated If additional WS if required, insert WS into floorplan by extending it horizontally and vertically Two-phase procedure: insert WS band between rows based the decap budgets of the modules in the row insert WS band between columns based on the decap budgets of the modules in the column Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 46 http://vlsicad.ucsd.edu Moving Modules to Insert WS Original floorplan 0 Moving modules in y+ direction ExtY B A 1 A 2 B 1 D 3 E D C 2 C WS band F 3 F E 4 G (a) G (b) Slide courtesy of S Zhao, K Roy & C.-K. Kok ECE 260B – CSE 241A Power Distribution 47 http://vlsicad.ucsd.edu Experimental Results Comparison of Decap Budgets (Ours vs “Greedy Solution”) Circuit decap budget (nF) (our method) decap budget (nF) (“greedy solution”) Percentage (%) apte 27.73 32.64 85.04 xerox 8.00 13.50 59.30 hp 3.45 6.18 55.80 ami33 0 0.80 0.00 ami49 10.28 24.80 41.50 playout 42.91 61.67 69.6 ECE 260B – CSE 241A Power Distribution 48 http://vlsicad.ucsd.edu Experimental Results for MCNC Benchmark Circuits Modules Existing WS (m2) (%) 9 751652 (1.6) decap Inacc. 27.73 WS (m2) (%) 0 (0) xerox 10 1071740 (5.5) 8.00 hp 11 695016 (7.8) ami33 33 ami49 playout Circuit apte Added WS (m2) (%) 4794329 (10.3) Est. Peak Noise (V) before 1.95 Est. Peak Noise (V) after 0.24 0 (0) 528892 (2.7) 0.94 0.20 3.45 306076 (3.5) 300824 (3.4) 1.09 0.23 244728 (21.3) 0 N/A 0 0.16 0.16 49 2484496 (7.0) 10.28 891672 (2.5) 463615 (1.3) 1.45 0.25 62 5837072 (6.6) 42.91 792110 (0.9) 3537392 (4.0) 1.23 0.24 ECE 260B – CSE 241A Power Distribution 49 Budget (nF) http://vlsicad.ucsd.edu Floorplan of playout Before/After WS Insertion ECE 260B – CSE 241A Power Distribution 50 http://vlsicad.ucsd.edu Conclusion A methodology for decoupling capacitance allocation at floorplan level is proposed Linear programming technique is used to allocate existing WS to maximize its utilization A heuristic is proposed for additional WS insertion Compared with “Greedy” solution, our method produces significantly smaller decap budgets ECE 260B – CSE 241A Power Distribution 51 http://vlsicad.ucsd.edu ECE 260B – CSE 241A Power Distribution 52 http://vlsicad.ucsd.edu