Coping with Interconnect Resistive Parasitics Impact of Resistance • We have already learned how to drive RC interconnect • Impact of resistance is commonly seen in power supply distribution networks: – IR drop (ohmic voltage drop) – Voltage variations • Power supply is distributed to minimize the IR drop and the change in current due to switching of gates RI Introduced Noise V A 2-cm-long Vdd or Gnd Al wire allows current of 1mA per µm width (max due to electromigration). Assuming a sheet resistance of 0.05Ω/ , the resistance of this wire (per µ m width) equals 1kΩ. A current of 1mA/µ m would result in a 1V voltage drop. Φpre DD I R‘ V V DD - ∆ ‘ X M1 I ∆V ∆V R IR drop on a VDD grid is caused because the current demanded by the transistors or gates of a design flows from the VDD I/O pins (or bump bonds in the case of flip-chips) through the RC network of the power grid, and leads to a decreased VDD voltage at the devices. Ground bounce is a similar phenomenon, where the current flows back to the VSS pins, and the RC network causes the VSS voltage at the devices to rise. The altered value of the voltage supply reduces noise margins and changes the logic levels as a function of the distance from the supply terminals. IR voltage drop over the supply rails might partially turn on M1 . This can result in accidental discharging of the precharged, dynamic node X, or can cause static power consumption if the connecting gate is static. Based on a survey of over 206 tapeouts, targeting the 0.13µm process, more than 50% of tapeouts will fail if the power distribution system is not validated! Supply voltage (V) Supply Voltage Variation Reliability & power è Vmax Vmin è frequency Time (msec) • Activity changes • Current delivery RI and L(di/dt) drops, wire planning • Within-die variation Source: IBM IR Drop The risk of a design suffering from IR drop or ground bounce increases with shrinking process technology and with next-generation designs. With every technology shrink, the current demand per unit area of a design is increased, basically because of shrinks in the gate oxide thickness. This, combined with the fact that next-generation designs contain more transistors or gates, adds additional stress to power grid design and typically results in a power grid that contains an increasing number of parasitic RC values to be analyzed. Example: A leading-edge 90nm design containing over 10 million gates and processed with eight layers of metal could produce a VDD grid approaching 1 billion RC elements. There are two approaches typically used for power grid analysis, (1) static and (2) dynamic. (1) A static analysis solves Ohm's and Kirchoff's laws for a given power network but ignores localized switching effects on the power grid. (2) A dynamic approach performs comprehensive dynamic circuit simulation of the power grid network, which includes localized switching effects. Static Power Grid Analysis The static power grid analysis approach was created to provide comprehensive coverage without the requirement of extensive circuit simulations. Steps: 1. The parasitic resistance of the power grid is extracted 2. A resistor matrix of the power grid is built 3. An average current for each transistor or gate connected to the power grid is calculated 4. The average currents are distributed around the resistance matrix, based on the physical location of the transistor or gate 5. At every VDD I/O pin, a source of VDD is applied to the matrix 6. A static matrix solver is then used to calculate the currents and IR drops throughout the resistance matrix A static approach approximates the effects of dynamic switching on the power grid by making the assumption that de-coupling capacitances between VDD and VSS smooth out the dynamic peaks of IR drop or ground bounce. Advantage: Simplicity and comprehensive coverage Challenge: Accuracy - Local dynamic effects are not accounted for, neither are package inductance effects (Ldi/dt), both of which may result in optimistic IR drop or ground bounce results if there is insufficient de-coupling capacitance on the power grid. Dynamic Power Grid Analysis A dynamic power grid analysis requires that both resistance and capacitance of the power grid are extracted, and that a dynamic circuit simulation of the resistant RC matrix is completed. Steps: 1. The parasitic resistance and capacitance of the power grid is extracted 2. The parasitic resistance and capacitance of the signal nets is extracted 3. The design netlist is extracted 4. A circuit netlist is created from the extracted parasitics and netlists 5. A circuit simulation is executed, based on a suite of simulation vectors, which simulates the transistors or gates dynamically switching and the effect of this switching on the power grid Advantage: Accuracy; Since the results are based on circuit simulation, the IR drop and ground bounce results can be extremely accurate and take into account localized dynamic and package inductance effects. Challenges: • The parasitic extraction demands are high because you need to extract resistance and capacitance for the power grids and (as a minimum) the capacitance for the signal nets. • The circuit simulation can contain a huge number of elements to be simulated, which strains the capacity of the circuit simulation engine. • The vector set that is used to stimulate the simulation plays a dominant role in determining the quality of the output, if a comprehensive suite of vectors is not used, then the results will be questionable because sections of the power grid may not have been simulated. • Finally, given the number of elements associated with a single power grid, a power grid analysis solution based on comprehensive dynamic simulation will not easily scale as design sizes continue to grow. Many power grid analysis solutions that promote a dynamic approach must often resort to RC reduction techniques to manage the size of the data to be simulated; however, this directly conflicts with the main value of the dynamic approach, the accuracy of the results. RC reduction of the power grid can cause inaccuracies to creep into the analysis, and can hide real EM problems. Power and Ground Distribution GND VD D Logic Logic V DD GND (a) Finger-shaped network V DD GND (b) Network with multiple supply pins The most obvious solution to solve the IR drop problem is to reduce the maximum distance between the supply pins and the circuit connections. This is accomplished through a structured layout of the power distribution network. In most solutions, the power and ground are brought onto the chip via bonding pads located on the four sides of the chip. Depending on the layers allocated for power distribution, a distribution network kind is chosen. 3 Metal Layer Approach Single layer power grid Power and ground rails are routed on the same layer (vertically or horizontally) 3rd “coarse and thick” metal layer added to the technology for EV4 design Power supplied from two sides of the die via 3rd metal layer 2nd metal layer used to form local power strips that are strapped to the upper grid, and then routed on the lower metal levels. 90% of 3rd metal layer used for power/clock routing Metal 3 Metal 2 Metal 1 Courtesy Compaq 4 Metal Layers Approach (EV5) 4th “coarse and thick” metal layer added to the technology for EV5 design Power supplied from four sides of the die Grid strapping done all in coarse metal 90% of 3rd and 4th metals used for power/clock routing Metal 4 Metal 3 Metal 2 Metal 1 Courtesy Compaq 6 Metal Layer Approach – EV6 2 reference plane metal layers added to the technology for EV6 design Solid planes dedicated to Vdd/Vss Metal planes act as shields between signal layers (reduces crosstalk) Significantly lowers resistance of grid Lowers on-chip inductance Feasible when sufficient metal layers are available RP2/Vdd Metal 4 Metal 3 RP1/Vss Metal 2 Metal 1 Courtesy Compaq Electromigration Line-open failure Open failure in contact plug Electromigration limits the current density in a metal line. A direct current in a metal wire running over a substantial time period causes a transport of metal ions (migration), which leads to wire breaking or to shortcircuit to another wire. The most common effect is to cause increased resistance in power grid paths, which lead to increased IR drop or ground bounce and impacts chip timing. The rate of electromigration depends on the temperature, crystal structure and average current density (keep it below 1mA/µm). The average current density can determine the minimal wire width of power and ground network. Signal wires however that normally carry AC current are less susceptible to migration since the bidirectional flow of electrons tends to anneal any damage done to the crystal structure. Solutions: 1) Imposing strict wire-sizing guidelines, 2) add alloying elements (Cu or Tu) to Al to prevent movement of Al ions, 3) use Cu wires all together instead of Al – lifetime of Cu wire is 100 times longer than that of Al. Reducing IR Drop and EM Ideally, the main trunks should be large enough to handle all the current flowing through separate branches. In this case, the T-junctions have a high current density and may be prone to EM problems. It is important in this type of grid to examine the current density at all junctions, especially the corner providing large amounts of current to each block, to ensure that EM problems do not exist. The same argument holds for every block routed in this manner. Again, power grid design is truly full-chip when IR drop and EM issues are considered. A B Reducing IR Drop Another approach to minimizing IR drop, depicted in Figure, is to have a solid grid of Metal 4 and Metal 5 and use a via array to connect the two layers, effectively tying the whole grid to VDD. While this solves the problem at higher levels, it simply shifts the problem down to the lower levels of metal. Low resistance, high current paths can often be created by random placement of lower blocks. In fact, when you design the logic circuitry in the block, it is not clear where Metal 3 will tap to Metal 4, so you cannot predict the current flow. And if you cannot predict it, you must analyze it! Part of the grid may have to be removed to route some signals (as shown in Figure). Which straps can be removed without introducing problems? If you arbitrarily pick one that is conducting a large amount of current, the excess current must flow in adjacent straps which may push the current density in them beyond acceptable levels. Clearly, such decisions cannot be made without determining the current levels in the straps and then picking ones that have lower current levels. The complexity of the problem requires a set of power grid analysis tools. These examples illustrate that design decisions must be made with a global perspective in mind. Reducing IR Drop Over the years, supply voltage has been shrinking as device dimensions are scaled to avoid transistor punchthrough conditions, hot-electron effects, and device breakdown. This has resulted in smaller and smaller noise margins. With IR drop, the margins are reduced even further which makes it even more difficult to manage a multimillion transistor design. IR drop on a power grid primarily affects timing. IR drop compromises the drive capability of the gates and increases the overall delay. Typically, a 5% drop in supply voltage can affect delay by 15% or more. Delay in a clock buffer has been known to increase by more than 100% due to IR drop. Such an increase in delay is critical when you are managing clock skews in the range of 100 picoseconds. Imagine the effect of this type of unexpected delay along centrally located critical paths. Then path delay is no longer predictable and, in fact, the critical path may be somewhere else in the design due to IR drop. This means that the performance or functionality of the design is unpredictable. Ideally, timing calculations should take worst-case IR drop into account to improve accuracy. In Figure below, a portion of a design is shown with two metal lines connected by a narrow strap of metal. The metal lines must be wide enough to carry the average current needed to feed the circuitry connected to it. If the lines are too narrow, EM or IR drop may occur. A modeled grid has millions of resistors and capacitors. While, in general, the current follows the path of the lowest resistance, the exact flow depends on factors such as current drawn from neighboring modules that share the same network. The peak currents drawn are distributed over time (i.e., IR drop is dynamic). The largest stress on the power grid can be attributed to simultaneous switching events, such as clocks, and bus drivers. In a static context, IR drops are highest near center of design and lowest near VDD connections to the power supply. Reducing IR Drop WIDEN THE POWER ROUTES The simplest approach is to widen the lines that experience the largest voltage drops since increasing the width decreases the resistance (hence the IR drop). However, this may not always be possible due to constraints in the routing area. Via arrays should also be maximized wherever possible, since the resistance associated with a single via can have a significant impact on IR drop. MAXIMIZE THE USE OF DECOUPLING CAPACITANCE One effective approach is to use decoupling capacitors between power and ground, which can deliver the additional current needed by the power distribution system. These decoupling caps are usually scattered throughout the power grid, in any available space, and enable the use of the static approach to power grid analysis. STAGGER THE SWITCHING Since IR drop is due primarily to simultaneous switching events, another (more difficult) approach is to stagger the gates that are switching together such that they switch at slightly different times—at least enough to keep the problem within the noise budget. Alternatively, you could reduce the buffer size, but this may not be possible if the design fails to meet performance requirements with smaller devices. Device switching can be staggered to reduce the peak demands of current by introducing delays on the signals driving the gates. MAXIMIZE THE POWER I/O PINS A more aggressive solution is to use a ball-grid array, sometimes called solder bumps or C4 bumps, where the power supply connections can be at various points within the chip. This expensive solution requires placing many C4 bumps across the chip to minimize the worst-case IR drop in any location. Also, this solution cannot be used in sensitive areas such as memories and dynamic logic because C4 bumps generate alpha particles that may cause logic value upsets in the sensitive nodes. Nevertheless, when used appropriately, C4 bumps can reduce IR drop. The key to design is proper placement of the C4 connections, which can only be done effectively with full-chip analysis. Reducing Electromigration The basic idea in all approaches is to reduce the average current density seen by any metal segment. (1) The simplest approach is to widen the metal lines. However, increasing the width beyond a certain point leads to over-design, which costs area and can reduce yields. (2) Change the current flow in the power grid itself by adding jumpers and straps between different points in the grid. This would reroute current around the affected areas, but such changes would require another verification pass to confirm that the problem has not simply been moved to another area of the design. In many cases a standard cell block would not show any EM risk if analyzed by itself. However, in a fullchip context, current flowing to adjacent blocks overloads the power connections in the block, and the analysis tool identifies an EM risk. Recognizing these problems at the planning stage is helpful, but difficult to do. EM requires a detailed grid with unreduced data. Therefore, a complete picture of EM risk can only be obtained at the verification stage. IR Drop and the Power Distribution Problem After Before The figure shows a power flow diagram of the VDD grid in a multimedia chip. Different shading indicates various levels of IR drops. The darkest areas are the lowest points (valleys) of the IR drop contours. A significant IR drop occurs in the center region of the chip because only the top portion of the power grid feeds the large drivers in the top section. The upper and lower regions of the power system are not connected. If we strap the upper and lower regions together in two places, the IR drop problem is reduced significantly. The depth of the IR drop valleys has been reduced to acceptable levels, and the IR drops have been spread over a wider area of the grid. The lower region is now supplying more current to the upper region and therefore a better power distribution has been obtained by adding the two straps. Changes to the power grid in one section tend to have a global impact ! An accurate picture of the IR risk must be taken when entire chip is verified. Source: Cadence Resistivity and Performance Tr The distributed rc rc--line R1 C1 RN-1 R2 C2 RN CN-1 CN Vin 2.5 x= L/10 2 x = L/4 voltage (V) Diffused signal propagation 1.5 x = L/2 1 Delay ~ L2 x= L 0.5 0 0 0.5 1 1.5 2 2.5 3 time (nsec) 3.5 4 4.5 5 Problem: Delay of a wire grows quadratically with its length. It would require multiple clock cycles to get a signal from one side of the chip to its opposite end. It would be hard to achieve synchronization and correct operation. Solutions: 1) Better Interconnect Materials • Introducing silicides and copper reduced the resistance of polysilicon and metal wires, respectively. Adopting low-K dielectrics lowers the capacitance. Such new materials temporarily reduces wire delay for 1-2 generations. Innovative design techniques are the best way to cope with global wire delay. It is sometimes hard to avoid long polysilicon wires, such as the address lines in a memory, which connects a large number of device gates. This increases the memory density by avoiding the overhead of extra metal contacts. However, this leads to extra delay. WL Driver Polysilicon word line Metal word line Driving a word line from both sides (worst case delay is reduced by 4) Metal bypass WL k cells Polysilicon word line Using a metal bypass (connected to polysilicon every k cells. Delay is now proportional to (k/2)2. Providing contacts only every k cells helps to preserve memory density) Interconnect Projections: Copper • Copper is planned in full sub-0.25 µm process flows and large-scale designs (IBM, Motorola, IEDM97) • With cladding and other effects, Cu ~ 2.2 µΩ-cm vs. 3.5 for Al(Cu) ⇒ 40% reduction in resistance • Electromigration improvement; 100X longer lifetime (IBM, IEDM97) – Electromigration is a limiting factor beyond 0.18 µm if Al is used (HP, IEDM95) Vias 2) Better Interconnect Strategies destination diagonal y source x Manhattan • 20+% Interconnect length reduction • Clock speed Signal integrity Power integrity • 15+% Smaller chips plus 30+% via reduction Adding interconnect layers reduces average wire length, as routing congestion is reduced, and interconnects can follow a direct path between source and destination. The “Manhattan-style” wiring is typical in today’s routing tools bringing a large amount of overhead. Using 45 o lines offer 20-29% reduction in wire length. 45o lines were actually used in the early days, but have not be adopted because of complexity issues, impact on tools, and mask making concerns. Courtesy Cadence X-initiative 3) Repeater Insertion Making an interconnect line m times shorter reduces its propagation delay quadratically. Assuming that the repeaters have a fixed delay tpbuf, we can derive the optimum number of repeaters mopt to insert using an analysis similar to the one used for the optimization of a chain of transmission gates. mopt = L t 0.38rc = pwire ( unbuffered ) t pbuf t pbuf t p , opt = 2 t pwire( unbuffered )t pbuf The optimum is obtained when the delay of the individual wire segments = repeater delay Repeater Repeater Insertion R t p = m 0.69 d s cL rL L sγC d + + sCd + 0.69 sCd + 0.38rc m m m 2 Rd and Cd are the resistance and input capacitance, respectively, of a minimum-sized repeater. γ is the ratio between the intrinsic and input capacitances of the repeater. mopt = L sopt = t pwire(unbuffered ) 0.38rc = 0.69Rd C d (γ + 1) t p1 Rd c rC d t p ,min = (1.38 + 1.02 1 + γ ) L Rd Cd rc t p1 = 0.69 Rd Cd (γ + 1) = t p 0 (1 + 1 / γ ) presents the delay of an inverter for a fan-out of 1. It is clear that the insertion of repeaters linearizes the delay of a wire. Also, for a given technology and interconnect layer, there exists an optimal length of the wire segments between repeaters. The critical length is Lcritical = t p1 L = mopt 0.38 rc It can be derived that the delay of a segment of critical length is always given by t p ,critical = t p ,min mopt 0.69 = 2 1 + t p1 0 . 38 ( 1 + γ ) which is independent of the routing layer. Interconnect: # of Wiring Layers # of metal layers is steadily increasing due to: ρ = 2.2 µΩ-cm M6 • Increasing die size and device count: we need more wires and longer wires to connect everything T ins • Rising need for a hierarchical wiring network; M5 W local wires with high density and global wires with low RC S M4 H 3.5 Minimum Widths (Relative) 4.0 3.5 3.0 M3 3.0 2.5 2.5 2.0 M2 1.5 M1 1.0 poly substrate 0.25 µm wiring stack Minimum Spacing (Relative) M5 M4 M3 M2 M1 0.5 Poly 0.0 M5 2.0 M4 1.5 M3 M2 1.0 M1 Poly 0.5 0.0 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ Optimizing the Interconnect Architecture Even with buffer insertion, the delay of a resistive wire cannot be reduced below a minimum value. Hence, long wires often exhibit a delay that is longer than the clock period of the design. For example, a 10cm long Al-1 wire comes with minimum delay of 3.9ns after optimal buffer insertion and sizing, while the 0.25µm R CMOS R processRcan sustain clock speeds in excess of 1GHz (i.e., clock CLK CLK wire delay periods below CLK 1ns). Therefore, the all by itself becomes the limiting factor on the performance achievable by the integrated Inductance L di/dt Voltage Drop V DD During each switching action, a transient current is sourced from (or sunk into) the supply rails to charge (discharge) the circuit capacitances. i(t) L VDD and V SS connections are routed to the external supplies through bonding wires and package pins which possess a nonignorable series inductance. V ’DD V out V in CL Thus, a change in transient current induces a change in voltage between the internal (V’DD and V’SS)and external supply voltages. This affects the logic levels and results in reduced noise margins. • Longer supply lines have larger L VSS L ’ Example: Noise Induced by Inductive Bonding Wires and Package Pins This scenario happens when the driver input has steep rise time (50psec). The abrupt current change causes a steep voltage spike up to 0.95V over the inductor. Vout(V) 2.5 2 1.5 1 0.5 0 0 0.5 1.5 2 -9 x 10 2.5 2 1.5 1 0.5 0 0 0.04 Without inductors With inductors 0.04 0.02 0.02 0 0 decoupled 0.5 1 1.5 2 -9 x 10 0 0 0.5 0.5 0 0 0 0.5 1 1.5 2 time (nsec) x 10 -9 Input rise/fall time: 50 psec 0.5 1 1.5 2 -9 x 10 0.5 1 1.5 2 -9 x 10 1 VL (V) 1 A better modeling of the current distribution is as a triangle. This occurs when the input signal to the buffer displays slow rise and fall times. Thus, the estimated voltage drop over the inductor = L di/dt =(2.5nH x 40mA) / ((1.25/2) x 1ns) = 133mV. The peak current=40mA results from the fact that the total delivered charge (area under iL) is fixed, thus the peak of the triangle = double rectangular (average value). The voltage drop is below 100mV. 1 iL(A) The last stage of an output pad driver driving a load cap of 10pF over a voltage swing of 2.5V. The driver is sized to achieve equal rise and fall times=1ns. With an inductance of 2.5nH (packaging), the driver acts as a constant current source (for simplicity) with I av=20mA to achieve the 1ns tr and tf. I av=(10pF x (0.9-0.1) x 2.5V) / 1ns = 20mA 0 0.5 1 1.5 2 time (nsec) x 10 -9 Input rise/fall time: 800 psec The internal supply voltages deviate substantially from the external ones. For example, simultaneous switching of the 16 output drivers of an output bus would cause a voltage drop of at least 1.1V if the supply connections of the buffers were connected to the same pin on the package. Dealing with Ldi/dt 1) Seperate power pins for I/O pads and chip core. I/O drivers require largest switching currents producing large current changes. Isolate chip core from I/O drivers by providing different power and ground pins. 2) Multiple power and ground pins. Restrict the number of I/O drivers connected to a single supply pin to reduce the Ldi/dt per supply pin (5-10 drivers/pin). This number strongly depends upon the switching characteristics of the drivers (# of switching gates and rise/fall times) 3) Careful selection of the positions of the power and ground pins on the package. Inductance of leads and bond-wires associated with pins located at the corners of the package is substantially higher. Bonding wire Chip L Mounting cavity L´ Lead frame Pin Choosing the right pin Dealing with Ldi/dt 4) Increase the rise and fall times of the off-chip signals to the maximum extent allowable. The best driver is one that achieves a specified delay with maximum allowable rise and fall times at the output. 5) Schedule current-consuming transitions so that they do not occur simultaneously. It is possible to slightly stagger the switching of a set of output drivers by skewing their control inputs. 6) Use advanced packaging technologies. Surface mount or hybrids that come with substantially reduces capacitance and inductance per pin. 7) Add decoupling capacitances on the board. These capacitances act as local supplies and stabilize the supply voltage seen by the chip. They separate the bonding-wire inductance from the inductance of the board interconnect. The bypass capacitor, combined with the inductance, actually acts as a low-pass network that filters away the high-frequency components of the transient voltage spikes on the supply lines. 1 Board wiring Bonding wire Cd SUPPLY 2 Decoupling capacitor CHIP Dealing with Ldi/dt 8) Add decoupling capacitances on the chip. They are integrated on the chip to ensure cleaner supply voltages, due to high switching speeds and steep signal transitions. Example: To limit the voltage ripple to 0.25V, a capacitance of around 12.5nF must be provided for every 50K gate module in 0.25µm CMOS process. The capacitance is implemented using the thin gate oxide (issue when gate leakage kicks in!) The “Network-on-a-Chip” Address the interconnections on a chip as a communications problem and to apply techniques that have made large-scale communications and networking systems around the world operate reliably and correctly. Embedded Embedded Processors Processors Memory Memory Sub-system Sub-system Interconnect Backplane Rather than considering on-chip interconnections as point-to-point wires, they Accelerators Accelerators should be abstracted as communication channels over which data is transmitted within a “quality-of-service” setting, putting constraints on throughput, latency, and correctness. Configurable Configurable Accelerators Accelerators Peripherals Peripherals Today’s on-chip interconnect signaling techniques are designed with large noise margins to ensure a bit is transmitted correctly. Future designs may give up integrity for the sake of energy/performance. Any errors are corrected by other circuitry that is designed to provide error-correcting capabilities. “Network on a chip”: Rather than being statically wired from source-to-destination, data is injected as packets into a complete network of wires, switches, and routers, and it is the network that dynamically decides how and when to route these packets through its segments.