Clock skew verification in the presence of IR-drop in the

advertisement
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
635
Clock Skew Verification in the Presence of IR-Drop
in the Power Distribution Network
Resve Saleh, Member, IEEE, Syed Zakir Hussain, Member, IEEE, Steffen Rochel, Member, IEEE, and
David Overhauser, Member, IEEE
Abstract—Clocks are perhaps the most important circuits in
high-speed digital systems. The design of clock circuitry and the
quality of clock signals directly impact the performance of a very
large scale integrated chip. Clock skew verification requires high
accuracy and is typically performed using circuit simulators.
However, in high-performance deep-submicrometer digital circuits, clocks are running at higher frequencies and are driving
more gates than ever, thus presenting a higher current load on
the power distribution network with the potential for substantial
power grid voltage (IR)-drop. This IR-drop affects the clock
timing and must be taken into account in the verification process.
Since IR-drop is a full-chip phenomenon, the use of standard
circuit simulation on both the clock circuitry and the power-grid
is not practical. In this paper, we present a new methodology for
the verification of clock delay and skew. An iterative technique is
presented for clock simulation in the presence of full-chip dynamic
IR-drop. The effect of IR-drop on the timing of clock signals is
quantified on a small example, and demonstrated on a large chip.
Fig. 1. Simple Buffered Clock Tree.
Index Terms—Circuit simulation, CMOS integrated circuits,
integrated circuit interconnections, matrix decomposition, power
distribution, relaxation methods, very large scale integration.
I. INTRODUCTION
T
HE CLOCK is the fastest and most active signal in a digital integrated circuit. It spans the entire chip and is the
most critical control signal in the design [1]. In the early days
of very large scale integration technology, clock circuitry was
comprised of a few buffers in a simple tree structure, as shown
in Fig. 1. A single clock tree drove the entire chip and was
relatively easy to design. The clock skew, defined as the maximum difference between the arrival times of the clock signal to
the latches, was easily controlled and it was possible to obtain
nearly zero-skew clocks through an automatic clock synthesis
and routing process [2].
Minimization of the clock skew is one of the key performance
goals of clock design [3]. Today, the clock circuit is expected to
drive thousands of latches and the design goal is to ensure that
the clock signals all reach the latch inputs within a specified
time bound. In order to reduce clock delay and skew, the clock
Manuscript received January 31, 2000. This paper was recommended by Associate Editor E. Charbon.
R. Saleh is with Simplex Solutions, Inc., Sunnyvale, CA 94086 USA (e-mail:
res@simplex.com).
S. Z. Hussain is with Simplex Solutions, Inc., Sunnyvale, CA 94086 USA
(e-mail: szh@simplex.com).
S. Rochel is with Simplex Solutions, Inc., Sunnyvale, CA 94086 USA
(e-mail: steffen@simplex.com).
D. Overhauser is with Simplex Solutions, Inc., Sunnyvale, CA 94086 USA
(e-mail: ovee@simplex.com).
Publisher Item Identifier S 0278-0070(00)05340-9.
Fig. 2. Interconnect delay dominates gate delay in DSM.
is composed of large buffers driving long lines over the entire
chip. As a result, the clock consumes large amounts of power,
often exceeding 25% of the total chip power [4].
With the advent of deep-submicrometer (DSM) technology,
designing high-speed clock networks is becoming increasingly
complex. Typically, DSM refers to line widths of 0.35 m and
below. Mainstream industry has reached and surpassed this feature size, with 0.25 and 0.18 m already in production. As the
line width decreases, the number of devices continues to grow
according to Moore’s Law. However, it is not the increase in
device count alone that is making chip design difficult. Rather,
it is the fact that interconnect effects are now dominating the
performance of the chip. Parasitic effects such as interconnect
resistance and three-dimensional (3-D) capacitance have greatly
increased the design complexity, and made clock design a considerable challenge.
Fig. 2 illustrates the delay effects observed in DSM: gate
delay is decreasing while interconnect delay is increasing with
a crossover at roughly 0.35–0.25 m [5]. This type of graph is
0278–0070/00$10.00 © 2000 IEEE
636
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
II. EFFECT OF IR-DROP ON TIMING
Fig. 3.
The cause of IR-drop effects.
cited in virtually every paper on timing in DSM. In practice, the
curves apply to long nets in the design, and clock nets tend to be
in this category. The main reasons for the shift from gate dominance to interconnect dominance is the increase in resistance
and coupling capacitances. The resistance of the lines continues
to increase as the line width decreases and this introduces a
significant resistance-capacitance (RC) delay component. Coupling capacitances arise due to 3-D effects of tall, thin metal
lines that are closely spaced together. Unfortunately, this makes
it difficult to estimate the interconnect loads in advance. Most
clock synthesis tools rely on accurate loading information and
assume that gate delay is dominant, as shown in Fig. 2. Therefore, it is no longer possible to use these clock synthesis tools to
generate zero-skew circuits.
Another key issue adding to the complexity of clock design is
the IR drop in the power supply [6], [7]. The causes of IR drop
are the resistance in the power distribution system, as shown in
Fig. 3, and increases in the current flowing through the power
grid. In the past, resistance in the power grid could be safely
ignored. But with the narrow metal lines used in DSM, resistance is a major factor in the power distribution system. Often,
wider lines are used in power busses to reduce resistance, but
the speed of today’s clock designs require very large buffers
which draw large currents. When these currents flow from the
supply to the drivers, any resistance encountered in the power
bus causes the voltage to drop. Therefore, the far-end inverter in
Fig. 3 experiences a lower supply voltage than the first inverter.
Ironically, the clock circuit itself is one of the major causes of IR
drop. Of course, large bus drivers, memory decoder drivers and
input–output buffers can also lead to significant IR drop during
the operation of the chip. In this paper, we focus on the effect of
IR drop on clock timing.
The IR drop varies dynamically at each buffer as it is related
to the switching activity of the clock signal. It is clear that the
current delivered by the buffers will vary depending on the level
of the IR drop. This fluctuation of the supply voltage causes
variations in the delay, skew, and slew rates. These effects can
only be captured using detailed transient analysis. For simulation purposes, the clock circuit under analysis must incorporate
all the effects shown in Fig. 4. The nonlinear transistors, RC invariations must be
terconnect, coupling capacitances and
represented to obtain correct timing information. The coupling
capacitances can be grounded for the clock simulation since
most of the adjacent nets are stable during the rapid clock transition. The circuit model of Fig. 4 is used in our clock verification
methodology. Note that each buffer has a different value of
that varies as a function of time.
Before describing the verification techniques, it is important
to understand and quantify the effects of IR drop on timing using
a representative design. Since IR drop effectively reduces the
supply voltage, the buffers connected to the power grid become
weaker and the propagation delay through the clock increases.
In this section, we present a quantitative view of exactly how
much variation can be expected. For this purpose, two small
clock circuits, presented in Fig. 5, were simulated to observe the
impact of dynamic IR-drop. The circuits were deliberately kept
small enough so that SPICE simulations could be performed to
generate the results.
The circuit of Fig. 5(a) contains a five-stage clock configured
power grid. The clock tree
as a tree and connected to the
consists of 23 inverters connected together with distributed RC
interconnect. The circuit of Fig. 5(b) is an H-tree [9] configuration of the clock which attempts to evenly distribute the clock
to all parts of a chip. The power grids, shown as dotted lines in
Fig. 5, each consist of a mesh of 10 000 nodes, 20 200 resistors
and 10 000 capacitors lumped to ground. Two 1.8-V sources are
and
. This
connected to the power grid at nodes
asymmetric connection is used to create a variable dynamic IR
drop in the power grid. In Fig. 6, simulation results of the circuit
in Fig. 5(a) using a commercial SPICE program are presented. A
clock input ramp with a 0.1-ns rise time is applied at node CLK.
The waveforms at the output of all five stages are shown along
with an IR-drop waveform from the power grid. The simulation
results show the transient behavior of the power grid voltage
with a maximum drop associated with the switching of stages
four and five. The highest IR-drop value of 45 mV or 2.5%, is
observed at the power grid tap point of inverter G14. Similar results were obtained for Fig. 5(b) but are not shown here.
An instantaneous IR-drop plot for the whole power grid is
ps where the most
shown in Fig. 7 at the timepoint
significant voltage drop occurs, based on the results of Fig. 6.
The voltage drop is shaded from very light to very dark as the
voltage drop transitions from a lower value to a higher value.
The contour plot shows a transition from a very small IR drop
in the top corners to a noticeable drop at the positions of buffers
G12, G13, G14, and G15. This is expected since the voltage
supplies are connected in the top corners and the largest effective
resistance occurs from the supplies to these four buffers.
In order to assess the effect of IR drop on timing, the power
grid was removed and an ideal voltage source was attached to
all 23 buffers. The voltage source waveform was a time-varying
function that was similar to the IR-drop waveform of Fig. 6,
except that the level of IR drop was adjusted to be 0%, 5%,
in five successive runs. A scatter
10%, 15%, and 20% of
plot of the percentage increase in delay for each case is shown in
Fig. 8(a). The H-tree of Fig. 5(b) was simulated using the same
experiments to obtain Fig. 8(b). The results show that a given
variation, or IR drop, translates directly to
percentage of
the same percentage of delay variation (note that this is a change
in the delay as opposed to a change in skew). For example, a
10% voltage drop causes at most a 10% increase in delay. More
accurately, a 10% IR drop causes an increase in delay of about
5% to 10%. In fact, our results show that an % IR drop causes
SALEH et al.: CLOCK SKEW VERIFICATION IN THE PRESENCE OF IR-DROP
Fig. 4.
637
Buffers and RC interconnect model for clocks.
(a)
Fig. 5.
(b)
Clock circuits connected to the power-grid. (a) Simple Clock Tree. (b) Simple H-tree.
a delay change of
to % and this serves as a useful rule-ofthumb for quantifying the impact of IR drop on timing.
To support our claim, we examined the normalized sensitivity
. In general, the normalized sensiof delay to changes in
tivity of a variable to a variable is given by the standard
equation
(1)
In our case, is the switching delay, , and is the power
. The percentage change in the delay is
supply voltage,
simply the normalized sensitivity multiplied by the percentage
change in
(2)
Now, consider the delay associated with the inverter in Stage
1 of Fig. 4. If the input switches low, then the output of the
inverter will be charged to a high value. A simplified expression
for the delay of the gate switching from low-to-high is given by
(3)
is the total capacitance at the output,
is the
where
-channel charging current and the resistance is set to zero.
Assuming short channel devices, we expect that the transistor
would be velocity saturated for most of the transition. A simple
model for the transistor in the saturation region is given in [8]
(4)
where
locity,
is the channel width,
is the carrier saturation veis the oxide capacitance,
is the gate-to-source
638
Fig. 6.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
Circuit simulation results with dynamic IR-drop.
lationship between
and delay. It is also consistent with the
experimental results of Fig. 8. To see this, consider the nomV in our examples. The sensitivity
inal value of
at this value is approximately 0.75. This implies that a 10%
would produce a 7.5% increase in delay. As
decrease in
the voltage drops, the delay becomes more sensitive to changes
, which further supports the expermental results that a
in
produces a 5%–10% increase in the delay.
10% decrease in
Based on this analysis, we believe that our rule-of-thumb can be
used for quantifying the impact of IR drop on timing. Note that
the sensitivity of delay to IR drop gets even worse as the power
supply is reduced further.
III. CLOCK VERIFICATION METHODOLOGY
Fig. 7. IR drop at t = 200 ps for power grid of Fig. 5(a).
voltage,
is the drain-to-source voltage and
is the device
threshold voltage. We can compute the current in saturation by
to obtain [8]
setting
(5)
is the critical electric field and is the channel length.
where
, if we apply (5) to (3), we can compute
After setting
the normalized sensitivity from (1) to obtain
(6)
in Fig. 9 with
Equation (6) is plotted as a function of
V and
V. The sensitivity values lie
ranging
in the range of approximately 0.9 to 0.5 for
from 2.5 V to 4 V. This is expected since there is an inverse re-
Because of the importance of clock circuits, designers have
been verifying clocks in detail for years. However, a manual
approach was used until recently. As DSM technology became
prevalent, the interconnect issues and IR-drop problems made
manual clock verification obsolete. Designers began to pursue
new clock topologies and power routing schemes to avoid
these interconnect issues. For example, an H-tree [9] has a
central buffer that drives the entire clock tree. It is usually
implemented as a custom buffered tree (i.e., multiple buffers
distributed throughout the tree) to distribute the clock more
evenly throughout the chip. However, skew is more difficult
to manage. An alternative to this is the clock grid [10], which
reduces local clock skew, but is not area efficient. A third
alternative is a hybrid of the H-tree and the clock grid [11],
where the advantages of the two basic approaches are exploited,
while minimizing the disadvantages.
To mitigate potential IR-drop problems, designers use wide
metal lines in the topmost metal layer to reduce the resistance.
They will often add or remove “straps” (metal wires) to re-route
current and minimize IR drop [12]. Other methods include the
use of decoupling capacitors and ball-grid arrays [12]. It is apparent that the complexity of the verification increases significantly when using any of these advanced design techniques.
SALEH et al.: CLOCK SKEW VERIFICATION IN THE PRESENCE OF IR-DROP
639
(a)
Fig. 8.
(b)
Scatter plot of %Delay Change versus%IR drop Change for two clock structures (a) Clock tree of Fig. 5(a) (b) Simple H-tree from Fig. 5(b).
A. Parasitic Extraction
Fig. 9.
Sensitivity of Delay to V
Variations.
The manual process is both time consuming and error prone on
simple clock trees, but completely impractical on the advanced
structures. In fact, the volume of data is too large, and the clock
tracing is too complex for a manual approach. Furthermore, the
results cannot be patched together since the IR drop must be analyzed across the whole chip.
A more reliable approach is to take the final layout and extract
the transistor netlist, interconnect and power grid. The next step
is to automatically trace the clock from its source node to the
latches in the design. Once extracted, the clock circuit can be
simulated at the transistor level with its interconnect represented
as reduced-order models [13]. If the supply voltage were held
constant, a “best-case” clock skew can be determined for each
receiver. During the simulation, if the tap current information
of the buffers is captured for a power grid IR-drop analysis, the
IR-drop analysis can be folded back into the clock simulation
to obtain a “worst-case” clock skew. The two clock simulations
place a bound on the clock skew in the presence of IR drop. This
is the basic strategy used for clock verification described in the
rest of this section.
The first problem in clock verification is to perform the parasitic extraction. While this is not the subject of this paper, it is
important to describe the extraction process and its potential issues. Typically, only resistances and capacitances are extracted
for interconnects. In the immediate future, inductance will have
to be considered in the clock signals [14], [15] as the speeds
continue to increase and as the interconnect technology moves
from aluminum to copper [5]. In this paper, we consider only
RC effects, although the analysis can be equally applied to interconnect with inductive effects. The emphasis in signal parasitic
extraction is on the 3-D capacitance [16]. The interconnect between clock buffers is extracted by taking into account the surrounding interconnect. Both the self-capacitance and the coupling capacitances must be extracted using methods that account
for 3-D effects of neighboring lines. During the clock analysis,
the coupled capacitances would be grounded since the switching
of the clock occurs so rapidly that other signals look like ac
ground. In addition to capacitance, the resistances of each metal
segment must be extracted using one–dimensional and two-dimensional techniques. It is important to include via resistance
in the interconnect extraction. Via capacitance is typically small
and therefore negligible (although it is necessary to include their
effects if there are a large number of vias along a line).
The extraction of the power grid is focused more on resistance
extraction since it is the primary source of voltage drop along
the power grid. Resistance extraction is performed in the same
manner as described above for signals. Extraction of the individual grounded capacitances associated with the power grid
can be done in a specialized manner, or in the same way as the
interconnect extraction. Either way, the entire full-chip power
grid must be extracted for IR-drop analysis. The power grid for
and
should be extracted for a complete analysis.
both
Typically, the IR drop in a supply grid is worse than the IR-drop
in the ground grid due to the presence of substrate capacitance.
power grid is analyzed. However, the
In this paper, only the
640
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
Fig. 10. Simulation of nonlinear devices with constant tap voltages.
techniques presented are equally applicable to IR drop in both
supply and ground grids.
Finally, the actual devices, specifically the MOS transistors,
must be extracted. A clock tracing algorithm is required to extract the clock circuit from the transistor netlist to improve the
efficiency of the simulation. The tracing algorithm starts at the
root node of the clock and searches until latches or flip-flop circuits are encountered. The transistors and interconnect along
these paths are tagged as being part of the clock circuit. The
loading effects of the final gates driven by the clock buffers
can be modeled as constant capacitances. This assumption is
made because the size of clock buffers is considerably larger
than the devices in the driven gates and latches. Moreover, the
linear capacitance in the wiring interconnect shields the nonlinear loading capacitances during the period of fastest signal
transitions.
The main issue in this extraction step is the size of the resulting data. Storage of the data must be done efficiently, typically in a binary format, and in a data file organization that allows rapid access by downstream tools. A typical power grid and
clock for a million-transistor circuit consists of several million
metal segments, and thousands of transistors [10]. We have performed the extraction of many designs containing over 5 million
transistors in a five-metal layer process and produced approximately 30–40 million resistors and 100 K–200 K transistors.
Conventional circuit or timing simulation techniques are unable
to handle the volume of data required for dynamic IR-drop simulation. Therefore, alternative methods must be pursued to verify
clock circuits in the presence of IR drop.
B. Iterative IR-drop Simulation
Relaxation methods have been used in the past to decompose
large problems into smaller ones before applying an iterative
method to obtain a solution [17], [18]. Linear, nonlinear and
waveform relaxation methods have all been successfully applied to MOS circuits. We also make use of a relaxation-based
approach, except that the partitioning and iteration counts
are different from the previous approaches, and specialized
methods are used to solve each partition. In previous relaxation-based circuit simulators, the power supply variations
were ignored and the circuit was partitioned into subcircuits
composed of source-drain connected MOS transistors. These
subcircuits were all nonlinear and could easily be solved using
standard direct methods. The characteristics of our problem
are different and so a different circuit partitioning is required.
Specifically, the circuit is partitioned into a very large linear
portion and a relatively small nonlinear portion. Given this partitioning, a simulator must be designed with a focus on solving
an extremely large sparse linear problem that is generated by
the power-grid. A second simulator is required for the nonlinear
circuit to provide tap currents for nonlinear devices that are
connected to the power grid. In the relaxation process, the
power grid and clock are solved separately and their results are
synchronized at regular intervals during dynamic simulation.
First, the partitioning approach is illustrated in more detail. In
Fig. 10, a two-stage inverter circuit that represents the clock is
connected to a series of resistors that represent the power grid.
For this example, the results of partitioning the problem into the
nonlinear and linear portions are shown in Figs. 11 and 12, respectively. While solving the nonlinear circuit of Fig. 11, the tap
connections of the -channel
voltages are applied at the
transistors. The resulting tap currents of these devices, along
with their capacitive loadings, are used in the solution of the
linear circuit in Fig. 12. Given this partitioning, the success of
this approach relies on two factors: the ability to build specialized simulators for each partition and the ability to converge to
the correct solution in a few iterations.
1) Convergence: The next issue to consider is the convergence property. Linear relaxation is the easiest to understand.
To simplify the problem for illustrative purposes, one can treat
the nonlinear devices of Fig. 11 as linear devices by linearizing
the transistors. It is easy to prove that linear relaxation would
converge to the correct solution if all nodes had grounded capacitors and the timestep was sufficiently small [18]. The speed
of convergence would depend on the tightness of the coupling
between the two blocks of Figs. 11 and 12. Since the source
nodes of the p-channel transistors are connected to the resistive
power grid, the coupling tends to be fairly tight, which leads to
a large number of iterations to reach convergence.
In the case of nonlinear relaxation, sometimes referred to
as “timepoint” iteration, each iteration of the nonlinear circuit
would involve linearization using Newton’s method. The requirements for convergence are nearly identical to linear relaxation, with the additional requirement that the initial guess must
be close enough to the final solution [18]. Since the final solu, it is
tion to these types of problems are typically around
. Of
easy to satisfy this condition: the initial guess is always
SALEH et al.: CLOCK SKEW VERIFICATION IN THE PRESENCE OF IR-DROP
Fig. 11.
641
Inverter chain with a distributed power grid.
Fig. 12. Power-grid with piecewise-constant tap currents and device
capacitances.
course, if this guess is quite far from the final solution, convergence may be difficult to achieve. But then, the circuit is unlikely
.
to work anyway because the power supply is not close to
So “timepoint” or nonlinear relaxation is a viable approach in
this respect, but also suffers from the issue of tight coupling between the two partitions.
Finally, if a waveform relaxation (WR) method is used to
compute the solution, then, in addition to the requirements for
linear relaxation, the window sizes of the iteration must be small
enough to achieve convergence [18]. If the window sizes were
equal to the timepoints used in nonlinear relaxation, the two
algorithms would be identical. However, the use of a larger
window size equal to the clock period is more advantageous.
Consider the results of two successive WR iterations. In the
first iteration, the clock is simulated with zero IR-drop on the
power grid. This configuration is depicted in Fig. 11, where
and
are set to
voltage. The resulting
both
solution of the power grid of Fig. 12 generates the worst-case
voltage drops. On the second iteration, all devices in the clock
of Fig. 11 encounter large IR-drop voltages at their tap points
and therefore draw less current. The next iteration of the power
grid of Fig. 12 generates best-case voltage drops. Additional iterations will improve the accuracy, but if the worst-case results
are within the specifications of the design, further iterations will
be unnecessary. In fact, rather than iterating to convergence, a
weighted average of the first two iterations produces a relatively
accurate result. From the standpoint of verification, the two-step
WR-based method is the most appropriate approach, and avoids
any convergence issues.
2) Solution to Linear and Nonlinear Partitions: The partitioning and analysis approach described above has several advantages. By keeping the linear and nonlinear solvers separate,
one can design two separate tools and optimize their individual
performances. The speed of simulation is greatly increased, as
the synchronization between the tools takes place at the end of
each simulation period. The worst-case IR-drop obtained after
the first iteration provides a good measure of the quality of the
power grid and the clock design. The only issue that remains is
the solution of each partition.
For the nonlinear problem represented by the clock circuit,
there are a variety of published approaches that could be used
such as EMU [19], SPECS2 [20], iSPLICE3 [21], etc. These
methods all use a form of timing simulation and are therefore
10–100 times faster than SPICE. Furthermore, they can all
handle the capacity of a typical clock circuit. For interconnect
associated with clock signals, a reduction method [13] is needed
to generate a lower order model for simulation. If a clock grid
structure is used a reduction method and a direct solver must
be embedded in the timing simulator because the clock grids
tend to generate very large and sparse matrices. One important
requirement for the simulation is that it must be accurate to
within a few percent of SPICE to produce meaningful results
for clock skew.
The linear problem generated by the power grid can be
solved in a number of ways. Typically, the matrix is symmetric
and positive definite [22], and is extremely large and sparse.
Therefore, the choices for the solution can be linear relaxation,
a pre-conditioned conjugate gradient, or a direct method based
on a Cholesky decomposition [22]. While any one of these
methods is applicable, large problems dictate the use of a
conjugate gradient method with incomplete Cholesky preconditioning, while smaller problems with low fillins can be solved
by an efficient implementation of a Cholesky decomposition
[7]. These methods are typically 100–1000 times faster than
SPICE for large problems. In fact, SPICE cannot even read in
the data associated with the power grids of chips containing a
million transistors, let alone solve the problem.
IV. RESULTS
We now present results using the two-step iterative technique
described in the previous section. The first example circuit was
shown in Fig. 5(a), a simple clock tree connected to a power grid.
In Fig. 13, the IR-drop computed using waveform iterations is
compared with the exact solution using SPICE simulation. The
first iteration results in an overestimate of the IR drop, whereas
the second iteration underestimates the value. Further iterations
converge toward the exact solution which is indicated by the
642
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
Fig. 13. Convergence of IR-drop waveform iterations compared to SPICE simulation.
solid line. In practice, the iterations converge if the IR-drop is
within few percent of the supply voltage. However, this technique can become unstable if the IR-drop is large. This is most
likely due to the tight coupling and the fact that the window size
is relatively large (we set the window size equal to one clock
cycle).
Note that the first two iterations bound the exact solution. Instead of continuing with iterations, the final value of the IR-drop
can be estimated by taking a weighted average of the first two
iterations. This approach produces surprisingly accurate results,
as shown in Fig. 13, and once the IR drop is accurate, the skew
can be computed accurately. As described earlier, for verification purposes, the best approach is to use a two-step iteration
followed by a weighted average of the results. Typical weighting
factors are 10% of the first iteration and 90% of the second iteration, but these are adjustable parameters.
The second example circuit is a Sparc Core containing 100 K
transistors and a powergrid with 190 K resistors. This circuit
is used to illustrate the convergence behavior of clock-powergrid iterations. Table I contains the results of the iterations at
the output of the fifth stage of the clock tree. It provides the
maximum delay and skew values along with the differences between two successive iterations. In this case, convergence occurs when the delta delay and delta skew values are less than
a picosecond. The results demonstrate that the first two iterations tend to bound the correct solution. The subsequent iterations converge quickly to the final solution. In this case, we see
that the actual solution is closer to the second iteration but it lies
between the two iterations.
A third example circuit that contained 21 176 transistors and
52 000 RC components in a clock grid was simulated and compared to SPICE. The power grid contained 753 844 resisive elements. The run time for the clock simulation was 1600 s in
our nonlinear simulation tool and over 100 h with a commercial
SPICE tool. The power grid simulation required less than one
1 h in our linear simulation tool but SPICE could not read the
RESULTS
FOR
TABLE I
STAGE 5 RISING CLOCK
EDGE
entire power grid into memory so a runtime comparison could
not be performed.
The fourth and final example shown in Fig. 14 illustrates the
waveform results for a large chip using the two-step simulation
technique. The clock network of this industrial example consists of 30 000 transistors. The power grid itself is modeled by
a network of 400 000 resistors and 250 000 capacitors. SPICE
was unable to complete the simulation of the power grid or the
clock circuit. Our approach required approximately 55 min for
the two-step simulations. There are three sets of waveforms in
this plot of voltage versus time. Each set shows the actual waveforms from different settings of the IR drop. The skew for each
set of waveforms is defined as the width of the waveform envelope measured at the 50% transition point, as illustrated for
the waveforms labeled (b) in Fig. 14. The delay here is defined
as the time required to reach the midpoint of the skew value, as
illustrated in Fig. 14(b).
The fastest waveforms, labeled (a), are the results of the simulation without IR drop, which is the same as the first iteration
of our method. The results of a second iteration followed by
averaging is labeled (b). Note that both the delay and skew of
(b) increase relative to (a) because the proper IR drop has been
taken into account. In this case, there is a 30% change in the
SALEH et al.: CLOCK SKEW VERIFICATION IN THE PRESENCE OF IR-DROP
Fig. 14.
643
Simulation results for large chip (a) without IR-drop, (b) dynamic IR-drop, (c) fixed IR-drop.
skew value and a 0.2% change in the delay value. Clearly, an
error of 30% in the skew calculation cannot be tolerated by the
designer.
The curves denoted by 14(c) deserve a separate discussion.
Often, a designer uses a single worst-case IR drop for all buffers
during a SPICE simulation to try to bound the total delay and
skew. Our contention is that the results of this approach can
be very misleading. For illustrative purposes, we used a single
worst case IR-drop value on all buffers and re-ran the transient
simulation as shown in Fig. 14(c). The skew value is almost the
same as the wrong result of 14(a) and the delay value increased
by about 0.5%. While the approach does provide an upper bound
on the delay, it masks the 30% change in the skew value which
is the more critical metric. Therefore, using a single worst-case
static IR drop is misleading and should be avoided.
V. CONCLUSION
Verification of clocks in the presence of full-chip IR drop is a
difficult problem due to conflicting requirements of speed, accuracy and problem size. High-performance clock signals require
the highest accuracy because their skews are in the range of picoseconds. However, circuit simulators do not have the speed
and memory capacity to handle full-chip power-grid simulation. In this paper, it is demonstrated that the effect of IR drop
cannot be ignored in deep submicrometer design. For example,
a 10% drop in the supply voltage can lead to a 5%–10% change
in the clock timing, and it is only getting worse as the supply
voltage is scaled. It has a more pronounced effect on the clock
skew, as illustrated by an industrial example which exhibited a
30% change in the skew due to IR drop. A new two-step waveform-based algorithm with averaging was shown to be an effective technique for delay and skew verification of a clock design
in the presence of IR drop. Furthermore, the use of a single worst
case IR drop for clock skew should be avoided at all costs as it
can produce misleading skew information.
ACKNOWLEDGMENT
The authors would like to thank D. Divekar for assistance
with the simulation results in this paper. They would also like to
thank the reviewers for providing excellent feedback to improve
the overall quality of the paper.
REFERENCES
[1] E. G. Friedman, Ed., Clock Distribution Networks in VLSI Circuits and
Systems. Piscataway, NJ: IEEE Press, 1995.
[2] A. Kahng and C. W. Tsao, “Planar-DME: A single-layer zero-skew clock
tree router,” IEEE Trans. Computer-Aided Design, vol. 15, Jan. 1996.
[3] G. Tellez and M. Sarrafzadeh, “Minimal buffer insertion in clock trees
with skew and slew rate contraints,” IEEE Trans. Computer-Aided Design, vol. 16, Apr. 1997.
[4] M. K. Gowan, L. L. Biro, and D. B. Jackson, “Power considerations in
the design of the alpha 21 264 microprocessor,” in Proc. Design Automation Conf., June 1998, pp. 726–731.
[5] “Technology Roadmap for Semiconductors,” Semiconductor Industry
Association, San Jose, CA, 1997.
[6] G. Steele, D. Overhauser, S. Rochel, and S. Z. Hussain, “Full-chip verification methods for DSM power distribution systems,” in Proc. Design
Automation Conf., June 1998, pp. 744–749.
[7] A. Dharchoudhury, R. Panda, D. Blaauw, and R. Vaidyanathan, “Design
and analysis of power distribution networks in PowerPC microprocessors,” in Proc. Design Automation Conf., June 1998, pp. 738–743.
[8] K. Toh, P. Ko, and R. Meyer, “An empirical model for short-channel
MOS devices,” IEEE J. Solid-State Circuits, vol. 23, pp. 950–957, Aug.
1988.
[9] A. Vittal and M. Marek-Sadowska, “Low-power buffered clocktree design,” IEEE Trans. Computer-Aided Design, vol. 16, pp. 965–975, Sept.
1997.
[10] M. P. Desai, R. Cvijetic, and J. Jensen, “Sizing of clock distribution networks for high performance CPU chips,” in Proc. 33rd ACM/IEEE Design Automation Conf., June 1996, pp. 389–394.
[11] K. M. Carrig, N. T. Gargiulo, R. P. Gregor, D. R. Menard, and H. E.
Reindel, “A new direction in ASIC high-performance clock methodology,” in Proc. Custom Integrated Circuits Conf., May 1998, pp.
27.3.1–27.3.4.
[12] R. Saleh, M. Benoit, and P. McCrorie, “Power distribution planning,” in
Proc. Design, Automation and Test in Europe Conf., Paris, France, Feb.
1998, pp. 265–270.
[13] A. Odabasioglu, M. Celik, and L. T. Pileggi, “PRIMA: Passive reducedorder interconnect macromodeling algorithm,” IEEE Trans. ComputerAided Design, vol. 17, pp. 645–654, Aug. 1998.
[14] P. Restle, A. Ruehli, and S. Walker, “Dealing with Inductance in highspeed chip design,” in Proc. Design Automation Conf., June 1999, pp.
904–909.
644
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 6, JUNE 2000
[15] Y. Massoud, S. Majors, T. Bustami, and J. White, “Layout techniques
for minimizing on-chip interconnect self-inductance,” in Proc. Design
Automation Conf., June 1998, pp. 566–571.
[16] N. Arora, K. Raol, R. Schumann, and L. Richardson, “Modeling and
extraction of interconnect capacitances for multilayer VLSI circuits,”
IEEE Trans. Computer-Aided Design, vol. 15, pp. 58–67, Jan. 1996.
[17] R. Saleh and A. R. Newton, “The exploitation of latency and multirate
behavior using nonlinear relaxation for circuit simulation,” IEEE Trans.
Computer-Aided Design, Dec. 1989.
[18] J. White and A. Sangiovanni-Vincentelli, Relaxation-Based Techniques
for the Simulation of MOS VLSI Circuits. Norwell, MA: Kluwer Academic, 1986.
[19] B. Ackland and N. Weste, “Functional verification in an interactive symbolic IC design environment,” in Proc. 2nd Caltech Conf. VLSI, Jan.
1981, pp. 285–298.
[20] C. Visveswariah and R. Rohrer, “SPECS2: An integrated circuit timing
simulatorw,” in Proc. ICCAD, Nov. 1987, pp. 94–97.
[21] R. A. Saleh, S.-J. Jou, and A. R. Newton, Mixed-Mode Simulation and
Analog Multilevel Simulation. Norwell, MA: Kluwer Academic, 1994.
[22] G. Golub and C. Van Loan, Matrix Computations. Baltimore, MD:
Johns Hopkins Univ. Press, 1989.
Resve Saleh (S’78–M’79) received the B.S. degree in
electrical engineering from Carleton University, Ottawa, ON, Canada, and the M.S. and Ph.D. degrees
in electrical engineering and computer science from
the University of California, Berkeley.
In 1995, he co-founded Simplex Solutions, Inc.,
Sunnyvale, CA, to address deep submicrometer-integrated circuit verification and currently serves as
Chairman. Prior to starting Simplex, he spent ten
years as a Professor in the Department of Electrical
and Computer Engineering and the Department of
Computer Science at the University of Illinois, Urbana. He has also worked
for Toshiba Corporation in Japan, Tektronix in Beaverton, OR, and Mitel
Corporation in Ottawa, Canada. He began his career as a Software Engineer
with Bell-Northern Research in Ottawa, Canada. He was granted a patent for
nonlinear frequency domain analysis in 1997. He has written two books on
analog and mixed-mode simulation and published over 50 conference papers
and journal articles.
Dr. Saleh has served as technical program chair, conference chair, and general chair for the Custom Integrated Circuits Conference in 1993, 1994, and
1995, respectively. He has served on the technical program committees of the
Design Automation Conference, International Conference on Computer Design, and, most recently, the International Symposium on Quality in Electronic
Design. From 1992-1995, he held the position of chairman of the IEEE Standards Coordinating Committee 30 on Analog Hardware Description Languages
(AHDL).He also received an inventor recognition award from the Semiconductor Research Corporation in 1991. He is currently an Associate Editor for the
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS
AND SYSTEMS. He received a Presidential Young Investigator Award in 1990
from the National Science Foundation.
Syed Zakir Hussain (M’97) received the B.Sc.
degree with honors in mechanical engineering from
North Western Frontier Province (NWFP) University
of Engineering and Technology, Peshawar, Pakistan,
in 1987. He received the M.S. degree in mechanical
engineering in 1992 and the Ph.D. degree in electrical engineering in 1995, from Duke University,
Durham, NC.
From 1987 to 1988, he worked as a design engineer at National Engineering Services of Pakistan
(NESPAK). He is currently a Staff Engineer with
Simplex Solutions, Inc., San Jose, CA. His research interests include design
and development of simulation tools in the area of VLSI design verification.
Steffen Rochel (M’95) received the Ph.D. degree in
electrical engineering from the University of Technology at Ilmenau, Ilmenau, Germany, in 1992.
In 1992, he joined Anacad in Germany, where he
was engaged in research and development of analog
and mixed-signal behavioral modeling and simulation techniques. Since 1996, he has been with Simplex Solutions, Inc., San Jose, CA, working on the interconnect verification and analysis products. His research interests include circuit simulation, reliability
assessment, and timing and power analysis. He has
served on the Custom Integrated Circuits Committee since 1997.
David Overhauser (S’86–M’89) received the B.S.
degree in math and computer science from Purdue
University, West Lafayette, IN, in 1983 and the M.S.
and Ph.D. degrees in electrical engineering from the
University of Illinois, Urbana, in 1985 and 1989, respectively.
From 1989 to 1995, he was an Assistant Professor
in Electrical and Computer Engineering at Duke
University, Durham, NC, where he founded the
Design Automation Technology Center. He has
published two books and over 20 technical papers in
the area of timing simulation, macromodeling, mixed-signal simulation, and
design verification. He is a founder and is currently Vice President at Simplex
Solutions, Inc., San Jose, CA.
Dr. Overhauser has served on the technical program committees of the
Custom Integrated Circuits Conference and the International Symposium for
Quality Electronic Design.
Download