Lecture 21: Packaging, Power, & Clock Outline Packaging Power Distribution Clock Distribution 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 2 Packages Package functions – Electrical connection of signals and power from chip to board – Little delay or distortion – Mechanical connection of chip to board – Removes heat produced on chip – Protects chip from mechanical damage – Compatible with thermal expansion – Inexpensive to manufacture and test 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 3 Package Types Through-hole vs. surface mount 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 4 Chip-to-Package Bonding Traditionally, chip is surrounded by pad frame – Metal pads on 100 – 200 mm pitch – Gold bond wires attach pads to package – Lead frame distributes signals in package – Metal heat spreader helps with cooling 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 5 Advanced Packages Bond wires contribute parasitic inductance Fancy packages have many signal, power layers – Like tiny printed circuit boards Flip-chip places connections across surface of die rather than around periphery – Top level metal pads covered with solder balls – Chip flips upside down – Carefully aligned to package (done blind!) – Heated to melt balls – Also called C4 (Controlled Collapse Chip Connection) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 6 LGA Package 1 1366 gold-plated pads 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 7 Package Parasitics Use many VDD, GND in parallel – Inductance, IDD Package Signal Pads Signal Pins Chip VDD Bond Wire Board VDD Package Capacitor Chip 21: Package, Power, and Clock Lead Frame Chip GND CMOS VLSI Design 4th Ed. Board GND 8 Heat Dissipation 60 W light bulb has surface area of 120 cm2 Itanium 2 die dissipates 130 W over 4 cm2 – Chips have enormous power densities – Cooling is a serious challenge Package spreads heat to larger surface area – Heat sinks may increase surface area further – Fans increase airflow rate over surface area – Liquid cooling used in extreme cases ($$$) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 9 Thermal Resistance DT = qjaP – DT: temperature rise on chip – qja: thermal resistance of chip junction to ambient – P: power dissipation on chip Thermal resistances combine like resistors – Series and parallel qja = qjp + qpa – Series combination 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 10 Example Your chip has a heat sink with a thermal resistance to the package of 4.0° C/W. The resistance from chip to package is 1° C/W. The system box ambient temperature may reach 55° C. The chip temperature must not exceed 100° C. What is the maximum chip power dissipation? (100-55 C) / (4 + 1 C/W) = 9 W 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 11 Temperature Sensor Monitor die temperature and throttle performance if it gets too hot Use a pair of pnp bipolar transistors – Vertical pnp available in CMOS Ic I se qVBE kT VBE DVBE VBE1 VBE 2 kT I c ln q Ic I c 2 kT I c1 kT kT I c1 ln ln ln m ln q Is Is q Ic2 q Voltage difference is proportional to absolute temp – Measure with on-chip A/D converter 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 12 Power Distribution Power Distribution Network functions – Carry current from pads to transistors on chip – Maintain stable voltage with low noise – Provide average and peak power demands – Provide current return paths for signals – Avoid electromigration & self-heating wearout – Consume little chip area and wire – Easy to lay out 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 13 Power Requirements VDD = VDDnominal – Vdroop Want Vdroop < +/- 10% of VDD Sources of Vdroop – IR drops – L di/dt noise IDD changes on many time scales Power Max clock gating Average Min Time 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 14 IR Drop A chip draws 24 W from a 1.2 V supply. The power supply impedance is 5 mW. What is the IR drop? IDD = 24 W / 1.2 V = 20 A IR drop = (20 A)(5 mW) = 100 mV 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 15 IR Introduced Noise 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 16 Power Distribution 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 17 Power Distribution Low level distribution is in metal 1. Power has to be strapped in higher layers of metal. The spacing is set by IR drop, electromigration, and inductive effects. Always use multiple contacts on straps. 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 18 Power and Ground Distribution 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 19 3 Metal Layers (EV4) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 20 4 Metal Layers (EV5) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 21 6 Metal Layers (EV6) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 22 Power Supply Droop 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 23 L di/dt Noise 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 24 L di/dt Noise A 1.2 V chip switches from an idle mode consuming 5W to a full-power mode consuming 53 W. The transition takes 10 clock cycles at 1 GHz. The supply inductance is 0.1 nH. What is the L di/dt droop? DI = (53 W – 5 W)/(1.2 V) = 40 A Dt = 10 cycles * (1 ns / cycle) = 10 ns L di/dt droop = (0.1 nH) * (40 A / 10 ns) = 0.4 V 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 25 Dealing with L di/dt Separate power pins for I/O pads and chip core. Multiple power and ground pins. Careful selection of positions of power and ground pins on package. Increase rise and fall times as much as possible. Schedule current consuming transitions. Use advanced packaging technologies. Use decoupling capacitances on the board. Use decoupling capacitances on chip. 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 26 Choosing the Right Pin 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 27 Decoupling Capacitance 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 28 Bypass Capacitors Need low supply impedance at all frequencies Ideal capacitors have impedance decreasing with w Real capacitors have parasitic R and L – Leads to resonant frequency of capacitor 2 10 1 10 1 mF 0.25 nH impedance 0.03 W 10 10 10 0 -1 -2 10 4 10 5 10 6 10 7 10 8 10 9 10 10 frequency (Hz) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 29 De-coupling Capacitor Ratios EV4 – total effective switching capacitance = 12.5nF – 128nF of de-coupling capacitance – de-coupling/switching capacitance ~ 10x EV5 – 13.9nF of switching capacitance – 160nF of de-coupling capacitance EV6 – 34nF of effective switching capacitance – 320nF of de-coupling capacitance -- not enough! CMOS VLSI Design 4th Ed. Source: B. Herrick (Compaq) EV6 De-coupling Capacitance Design for DIdd= 25 A @ Vdd = 2.2 V, f = 600 MHz – 0.32-µF of on-chip de-coupling capacitance was added • Under major busses and around major gridded clock drivers • Occupies 15-20% of die area – 1-µF 2-cm2 Wirebond Attached Chip Capacitor (WACC) significantly increases “Near-Chip” decoupling • 160 Vdd/Vss bondwire pairs on the WACC minimize inductance CMOS VLSI Design 4th Ed. Source: B. Herrick (Compaq) EV6 WACC 389 Signal - 198 VDD/VSS Pins 389 Signal Bondwires 395 VDD/VSS Bondwires 320 VDD/VSS Bondwires WACC Microprocessor Heat Slug 587 IPGA CMOS VLSI Design 4th Ed. Source: B. Herrick (Compaq) Power System Model Power comes from regulator on system board – Board and package add parasitic R and L – Bypass capacitors help stabilize supply voltage – But capacitors also have parasitic R and L Simulate system for time and frequency responses Voltage Regulator VDD Bulk Capacitor Printed Circuit Board Planes Ceramic Capacitor Board 21: Package, Power, and Clock Package and Pins Package Capacitor Solder Bumps On-Chip Capacitor Chip On-Chip Current Demand Package CMOS VLSI Design 4th Ed. 33 Frequency Response Multiple capacitors in parallel – Large capacitor near regulator has low impedance at low frequencies – But also has a low self-resonant frequency – Small capacitors near chip and on chip have low impedance at high frequencies Choose caps to get low impedance at all frequencies impedance frequency (Hz) 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 34 Example: Pentium 4 Power supply impedance for Pentium 4 – Spike near 100 MHz caused by package L Step response to sudden supply current chain – 1st droop: on-chip bypass caps – 2nd droop: package capacitance – 3rd droop: board capacitance [Xu08] 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. [Wong06] 35 Distributed Model 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 36 Charge Pumps Sometimes a different supply voltage is needed but little current is required – 20 V for Flash memory programming – Negative body bias for leakage control during sleep Generate the voltage on-chip with a charge pump 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 37 Energy Scavenging Ultra-low power systems can scavenge their energy from the environment rather than needing batteries – Solar calculator (solar cells) – RFID tags (antenna) – Tire pressure monitors powered by vibrational energy of tires (piezoelectric generator) Thin film microbatteries deposited on the chip can store energy for times of peak demand 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 38 Capacitive Cross Talk CMOS VLSI Design 4th Ed. Capacitive Cross Talk Dynamic Node VDD CLK C XY Y In 1 In 2 In 3 CY PDN X 2.5 V 0V CLK 3 x 1 mm overlap: 0.19 V disturbance CMOS VLSI Design 4th Ed. Capacitive Cross Talk Driven Node 0.5 0.45 0.4 X VX RY CXY 0.3 Y CY tr↑ 0.35 tXY = RY(CXY+CY) 0.25 0.2 0.15 V (Volt) 0.1 0.05 0 0 0.2 0.4 0.6 t (nsec) Keep time-constant smaller than rise time CMOS VLSI Design 4th Ed. 0.8 1 Dealing with Capacitive Cross Talk Avoid floating nodes Protect sensitive nodes Make rise and fall times as large as possible Differential signaling Do not run wires together for a long distance Use shielding wires Use shielding layers CMOS VLSI Design 4th Ed. Shielding Shielding wire GND VDD GND Substrate ( GND ) CMOS VLSI Design 4th Ed. Shielding layer Cross Talk and Performance Cc - When neighboring lines switch in opposite direction of victim line, delay increases DELAY DEPENDENT UPON ACTIVITY IN NEIGHBORING WIRES Miller Effect - Both terminals of capacitor are switched in opposite directions (0 Vdd, Vdd 0) - Effective voltage is doubled and additional charge is needed (from Q=CV) CMOS VLSI Design 4th Ed. Impact of Cross Talk on Delay r is ratio between capacitance to GND and to neighbor CMOS VLSI Design 4th Ed. Dealing with Cross-Talk Evaluate and improve Constructive layout generation Predictable structures Avoid worst case patterns 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 46 Structured Predictable Interconnect V S G S V S S G S V S V Example: Dense Wire Fabric ([Sunil Kathri]) Trade-off: • Cross-coupling capacitance 40x lower, 2% delay variation • Increase in area and overall capacitance Also: FPGAs, VPGAs CMOS VLSI Design 4th Ed. Clock Distribution On a small chip, the clock distribution network is just a wire – And possibly an inverter for clkb On practical chips, the RC delay of the wire resistance and gate load is very long – Variations in this delay cause clock to get to different elements at different times – This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay – Reduces but doesn’t eliminate skew 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 48 Example 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 49 Example Skew comes from differences in gate and wire delay – With right buffer sizing, clk1 and clk2 could ideally arrive at the same time. – But power supply noise changes buffer delays – clk2 and clk3 will always see RC skew gclk 3 mm clk1 1.3 pF 21: Package, Power, and Clock 3.1 mm clk2 0.4 pF CMOS VLSI Design 4th Ed. 0.5 mm clk3 0.4 pF 50 Clock Uncertainties 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 51 Clock Nonidealities Clock skew – Spatial variation in temporally equivalent clock edges; deterministic + random, tSK Clock jitter – Temporal variations in consecutive edges of the clock signal; modulation + random noise – Cycle-to-cycle (short-term) tJS – Long term tJL Variation of the pulse width – Important for level sensitive clocking 52 CMOS VLSI Design 4th Ed. Review: Skew Impact F1 Q1 Combinational Logic D2 Tc clk tpcq Q1 tskew tpdq tsetup D2 clk t pd Tc t pcq tsetup tskew Q1 CL clk D2 tcd thold tccq tskew F2 sequencing overhead clk F2 clk F1 Ideally full cycle is available for work Skew adds sequencing overhead Increases hold time too tskew clk thold Q1 tccq D2 21: Package, Power, and Clock tcd CMOS VLSI Design 4th Ed. 53 Solutions Reduce clock skew – Careful clock distribution network design – Plenty of metal wiring resources Analyze clock skew – Only budget actual, not worst case skews – Local vs. global skew budgets Tolerate clock skew – Choose circuit structures insensitive to skew 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 54 Clock Dist. Networks Ad hoc Grids H-tree Hybrid 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 55 H-Trees Fractal structure – Gets clock arbitrarily close to any point – Matched delay along all paths Delay variations cause skew A A and B might see big skew 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. B 56 More realistic H-tree [Restle98] 57 CMOS VLSI Design 4th Ed. Itanium 2 H-Tree Four levels of buffering: – Primary driver – Repeater – Second-level clock buffer – Gater Route around obstructions Repeaters Typical SLCB Locations Primary Buffer 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 58 Itanium 2 Repeaters 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 59 Spines 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 60 Pentium IV Clock Spines 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 61 Pentium IV Clock Spines 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 62 Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 63 The Grid System GCL K Driver GCLK Driver Driver GCLK •No rc-matching •Large power Driver GCL K 64 CMOS VLSI Design 4th Ed. Alpha Clock Grids Alpha 21064 Alpha 21164 Alpha 21264 PLL gclk grid Alpha 21064 21: Package, Power, and Clock gclk grid Alpha 21164 CMOS VLSI Design 4th Ed. Alpha 21264 65 Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nF Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: • Single 6-stage driver at center of chip • Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm! 66 CMOS VLSI Design 4th Ed. 21164 Clocking tcycle= 3.3ns 2 phase single wire clock, distributed globally tskew = 150ps trise = 0.35ns 2 distributed driver channels Clock waveform final drivers pre-driver Location of clock driver on die 67 – – – – Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation CMOS VLSI Design 4th Ed. Clock Drivers 68 CMOS VLSI Design 4th Ed. Clock Skew in Alpha Processor 69 CMOS VLSI Design 4th Ed. EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS tcycle= 1.67ns trise = 0.35ns Global clock waveform tskew = 50ps 2 Phase, with multiple conditional buffered clocks – 2.8 nF clock load – 40 cm final driver width PLL 70 Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking CMOS VLSI Design 4th Ed. 21264 Clocking 71 CMOS VLSI Design 4th Ed. EV6 Clock Results ps 300 305 310 315 320 325 330 335 340 345 ps 5 10 15 20 25 30 35 40 45 50 72 GCLK Skew GCLK Rise Times (at Vdd/2 Crossings) (20% to 80% Extrapolated to 0% to 100%) CMOS VLSI Design 4th Ed. EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers NCLK (Mem Ctrl) DLL DLL DLL + DLLs compensate static and low-frequency variation L2R_CLK (L2 Cache) PLL L2L_CLK (L2 Cache) + divides design and verification effort GCLK (CPU Core) SYSCLK - DLL design and verification is added work + tailored clocks CMOS VLSI Design 4th Ed. 73 Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid Ex: IBM Power4, PowerPC – H-tree drives 16-64 sector buffers – Buffers drive total of 1024 points – All points shorted together with grid 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 74 Clock Gaters 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 75 Adaptive Deskewing 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 76 Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol 77 CMOS VLSI Design 4th Ed. Self-Timed Pipelined Datapath Req Req HS Ack In Done F1 tpF1 78 HS Ack Start R1 Req Start R2 Req HS Ack Done F2 ACK Start R3 tpF2 CMOS VLSI Design 4th Ed. Done F3 tpF3 Out Completion Signal Generation LOGIC In Out NETWORK Start DELAY MODULE Using Delay Element (e.g. in memories) 79 CMOS VLSI Design 4th Ed. Done Completion Signal Generation Using Redundant Signal Encoding 80 CMOS VLSI Design 4th Ed. Completion Signal in DCVSL 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 81 Self-Timed Adder VDD VDD Start C0 C0 P0 C1 G0 P1 C2 G1 P2 C3 G2 P3 Start C4 C4 G3 Start VDD C0 P0 K0 C1 P1 K1 C2 P2 C3 K2 P3 C4 C4 K3 Start (a) Differential carry generation 82 C4 C4 C3 C3 C2 C2 C1 C1 Start Start C0 Done CMOS VLSI Design 4th Ed. (b) Completion signal Completion Signal Using Current Sensing 21: Package, Power, and Clock CMOS VLSI Design 4th Ed. 83