Document 10836359

advertisement
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
1087
SILCA: SPICE-Accurate Iterative Linear-Centric
Analysis for Efficient Time-Domain Simulation of
VLSI Circuits With Strong Parasitic Couplings
Zhao Li and C.-J. Richard Shi, Fellow, IEEE
Abstract—A new circuit analysis method, named SPICEaccurate iterative linear-centric analysis (SILCA), is proposed
for the efficient and accurate time-domain simulation of deep
submicron very large scale integrated (VLSI) circuits with strong
parasitic couplings. SILCA consists of two key linear-centric techniques applied to time-domain nonlinear circuit simulation. For
numerical integration, explicit-formula substitution and iterativeformula transformation are presented to convert implicit variable
time-step integration to fixed leading coefficient (FLC) variable
time-step integration. This paper characterizes both convergence
and stability properties of the resulting FLC integration formulae.
For nonlinear iteration, a successive variable chord (SVC) method
is used as an alternative to the Newton–Raphson method. Further,
the low-rank update technique is implemented for fast LU factorization. With these techniques, the number and cost of required
LU factorizations are reduced dramatically. Experimental results
on nonlinear circuits coupled with substrate and power/ground
networks have demonstrated that SILCA achieves more than an
order of magnitude speedup over SPICE3 in terms of both the
cost of LU factorization and the overall CPU time. SILCA is suitable for efficient SPICE-like time-domain simulation of parasiticcoupled VLSI circuits, where the number of linear parasitic
elements dominates the number of nonlinear devices.
Index Terms—Circuit simulation, time-domain analysis.
I. I NTRODUCTION
W
ITH INCREASING operation frequency, lower supply voltage, and smaller device feature size, parasitic
coupling effects are becoming more and more important for
modern deep submicron very large scale integrated (VLSI)
circuit designs [24]. The increasing demand to integrate digital,
analog, and radio frequency (RF) circuits into one single chip
requires accurate analysis of very large scale integrated (VLSI)
circuits together with extracted parasitic elements arising from
interconnect lines, common substrate, power/ground networks,
etc. [1], [20], [24], [30], [32]. Meanwhile, on-chip and pack-
Manuscript received May 4, 2003; revised February 12, 2004, December 23,
2004, March 7, 2005, and April 28, 2005. This work was supported in part by
the U.S. Defense Advanced Research Projects Agency NeoCAD Program under
Grant 66001-01-1-8920, in part by the National Science Foundation (NSF)
CAREER Award under Grant 9985507, and in part by the NSF/Semiconductor
Research Corporation Joint Mixed-Signal Initiative under Grant CCR0120371.
This paper was recommended by Associate Editor S. Sapatnekar.
Z. Li was with the Department of Electrical Engineering, University of
Washington, Seattle, WA 98195 USA. He is now with Cadence Design
Systems, Inc., San Jose, CA 95134 USA (e-mail: zhaoli@cadence.com).
C.-J. R. Shi is with the Department of Electrical Engineering, University of
Washington, Seattle, WA 98195 USA (e-mail: cjshi@ee.washington.edu).
Digital Object Identifier 10.1109/TCAD.2005.855943
aging inductances are no longer ignorable for accurate circuit
analysis [8]. For such purposes as well as coupled circuit
and electromagnetic modeling [33], SPICE-like simulators are
desirable for accurate transistor-level time-domain simulation.
However, efficient simulation of such systems presents a
complexity challenge to SPICE [21]. For time-domain circuit
simulation, SPICE uses numerical integration formulae [2],
[19] to form companion models for capacitors and inductors
at each time point, and applies the Newton–Raphson method
[19] to linearize nonlinear devices. Then the circuit is simulated
at each time point by iteratively solving a system of linearized
equations in the form of Ax = b, where A is typically the socalled modified nodal analysis (MNA) circuit matrix [19], [21]
which is a Jacobian matrix. It is known that device evaluation
dominates simulation of small to medium size circuits, and its
cost can be reduced with device bypass [14], [21], table lookup
[1], parallel computation techniques [13], etc. However, for a
system with strong parasitic couplings, the per-iteration cost of
SPICE time-domain simulation is dominated by LU factorization [19] of the circuit matrix A. In practice, the cost for LU
factorization by sparse matrix solvers [12] is O(n1.1∼1.5 ) for
sparse circuits, where n is the circuit matrix size. However,
strong parasitic couplings present in deep submicron circuits
can cause the circuit matrix to become much denser, even with
model order reduction [22], [28] the cost of LU factorization
can approach its worst case O(n3 ) [24].
One key idea to improve the efficiency of SPICE-like circuit
simulation is to keep the circuit matrix as constant as possible
during the entire time-domain simulation and, therefore, reduce
the number of LU factorizations required. This has been implemented in both numerical integration and nonlinear iteration
stages. For numerical integration, several strategies have been
proposed on reformulating the backward differentiation formulae (BDF) [2], [19] to keep the leading coefficient constant
since it is the leading coefficient that contributes to a Jacobian
matrix. These include fixed coefficient methods [9] (i.e., in
LSODE [25]), fixed leading coefficient (FLC) methods [10]
(i.e., in DASSL [23]), overdetermined polynomial methods
(ODPM) [5], etc. All these methods have been shown to be
effective in bypassing Jacobian matrix factorization. However,
the stability of fixed coefficient and FLC methods is worse
than that of variable coefficient methods [10]. Furthermore, for
fixed coefficient and FLC methods, interpolation must be performed at each time point, which will unfortunately introduce
extra errors and increase simulation cost. The overdetermined
0278-0070/$20.00 © 2006 IEEE
1088
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
polynomial method [5] overcomes the interpolation problem by
introducing an extra coefficient in the BDF. However, the stability of the overdetermined polynomial method is worse and the
ODPM-1 formula [5] has been shown to be stable only when the
present time step-size hn is less than or equal to the predefined
basis time step-size h. Therefore, a large basis time step-size h
has been adopted in [5] to have an hn /h ratio of less than 1.
The efficiency of the overdetermined polynomial method is thus
limited.
To reduce the number of LU factorizations during the nonlinear iteration process, quasi-Newton methods [6], [29] have been
studied extensively and applied in circuit simulation [1] and
mixed-mode circuit and device simulation [35]. The successive
chord method [19] has been explored for fast transistor-level
gate-delay calculation [1], where each transistor is modeled as
a fixed linear resistor (called chord) combined with a variable
nonlinear current source. Since a fixed chord is used, the circuit
matrix will not change during nonlinear iteration and only
one LU factorization is required overall if a fixed time stepsize is used. Unfortunately, there are two principal difficulties
that restrict the success use of this linear-centric idea to the
simulation of general VLSI circuits. 1) Most VLSI circuits have
widely distributed time constants and require variable time
step-size control for simulation efficiency and accuracy. With
variable step-sizes, the circuit matrix is no longer constant
across time points unless an FLC numerical integration formula
as discussed before is used. 2) The successive chord method
may need an excessive amount of iterations to converge, and
thus offsets the gain from the reduction of LU factorizations.
Recently, in the contexts of power grid analysis [4], substrate
analysis [24], and parasitic extraction [11], Krylov-subspacebased iterative methods such as the conjugate gradient algorithm, generalized minimum residual (GMRES) algorithm,
etc., have been shown to be more efficient than the method
of LU factorization and forward/backward substitution (named
the direct method). However, there is no report of successful
and robust applications of iterative methods to classical timedomain nonlinear circuit simulation in the literature.
This paper presents SPICE-accurate iterative linear-centric
analysis (SILCA), a new direct method capable of analyzing
VLSI circuits containing strong parasitic coupling effects with
SPICE-like accuracy yet orders of magnitude faster. SILCA
consists of applying the linear-centric principle to both numerical integration and nonlinear iteration to keep circuit matrices
as constant as possible during variable step-size time-domain
nonlinear circuit simulation.
• Two general techniques, namely explicit-formula substitution and iterative-formula transformation, are presented to
convert implicit integration formulae in SPICE-like simulators to FLC integration formulae. These formulae lead
to constant equivalent conductance in capacitor/inductor
companion models.
• Successive variable chord (SVC) method, a variant of the
successive chord method, is introduced to keep linearized
conductance of nonlinear devices constant for a larger
voltage/current range by incorporating device-related behavioral knowledge. With the SVC method, a piecewise
weakly nonlinear (PWNL) MOSFET model is introduced
for the calculation of Jacobian matrices. The low-rank update technique is further applied for fast LU factorization
by noting the fact that the number of nonlinear devices,
switching operating PWNL regions, is only few at a single
time point.
With these, the number of required LU factorizations can be
reduced by orders of magnitude with a moderate increase of iterations. Thus, rather than solving a newly linearized system by
another costly LU factorization, we are able to achieve the same
accurate results by several efficient forward/backward substitutions on a previously linearized system. The entire method is robust, accurate, and has been implemented into SPICE3. Further,
the proposed method is compatible with other circuit analysis
methods, such as model order reduction [22], [28], to achieve
even greater simulation speedup.
Some preliminary results of this paper were presented in
[16]. The rest of this paper is organized as follows. Section II
presents new FLC integration schemes, the analysis of their
stability and convergence properties, and methods for adaptive step-size control. Section III presents the SVC method
and the low-rank update technique. The SILCA algorithm is
described in Section IV. Section V shows experimental results
on substrate and power/ground coupling analyses. Section VI
concludes the paper.
II. FLC I NTEGRATION S CHEMES
In this section, we present and characterize two general
techniques of taking any implicit integration formula to derive such an integration formula that yields a constant circuit
matrix for variable step-size time-domain circuit simulation.
Mathematically, let xn , xn−1 , . . . , x0 and ẋn , ẋn−1 , . . . , ẋ0 be
the values and first-order time derivatives of variable x at
time points tn , tn−1 , . . . , t0 , then any linear multistep numerical
integration formula implemented in SPICE-like simulators can
be written in the general form
ẋn =
k
ai xn−i +
i=0
l
bj ẋn−j
(1)
j=1
where ai , i = 0, 1, . . . , k, and bj , j = 1, 2, . . . , l, are coefficients of the integration formula, the leading coefficient a0 is
nonzero (hence implicit), and hn = tn − tn−1 is the current
time step. Let h be some kind of basis time step-size, the current
time step-size can be rewritten as hn = αh, where α is a
positive real number.
Notice that only the leading coefficient a0 contributes to
the circuit matrix. In general, a0 is a function of αh. Since
α changes with time points, the circuit matrix would change.
To keep the circuit matrix constant, we rewrite the integration
formula above as
ẋn = a0 (h)xn + a0 (αh)xn +
k
i=1
ai xn−i +
l
j=1
bj ẋn−j (2)
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
Fig. 1.
1089
Capacitor companion model using the mixed trapezoid FE formula.
where a0 (h) is independent of α. Then, we would like to
substitute xn in the second term by all the known values from
the previous time points. The first technique is to replace xn
in the second term using an explicit integration formula. This
is called explicit-formula substitution. The second technique
is to replace xn in the second term using an initial guess and
then iterate to convergence. This is called iterative-formula
transformation. With these, the resulting formulae have an FLC
and are referred to as FLC integration formulae, following the
convention of Jackson and Sacks-Davis [10].
In the following subsections, we use the standard trapezoid
formula as an example to derive FLC integration formulae
based on explicit-formula substitution and iterative-formula
transformation. We characterize both stability and convergence
properties of the resulting integration formulae. We note that
these derivation and analyses can be applied to any implicit
integration formula used in a circuit simulator. Furthermore, we
present how the resulting formulae can be used in a way similar
to the classical predictor-corrector integration scheme, how to
adaptively control the basis time step-size, and how to control
stability.
After the xn in the third term in (3) is approximated by (4),
the mixed trapezoid FE formula with a time step-size hn = αh
is obtained as
ẋn =
With hn = αh, the standard trapezoid formula can be
rewritten as
ẋn ≈
2
(xn − xn−1 ) − ẋn−1
hn
=
2
(xn − xn−1 ) − ẋn−1
αh
=
2α − (2α − 2)
(xn − xn−1 ) − ẋn−1
αh
2
2
2α − 2
(xn − xn−1 ) − ẋn−1 .
= xn − xn−1 −
h
h
αh
(3)
Now we would like to substitute xn in the third term by using
any explicit Adams–Bashforth formula [19]. The simplest is the
forward Euler (FE) formula with a step-size αh defined as
xn ≈ xn−1 + αh ẋn−1 .
(4)
(5)
The mixed trapezoid FE formula is an implicit integration
formula. When α = 1, it reduces to the standard trapezoid
formula. When α = 1/2, it represents the backward Euler
(BE) formula with a step-size h/2.
To see the circuit interpretation of the mixed trapezoid FE
formula (5), the companion model of a linear capacitor is shown
in Fig. 1. Note that even though the actual time step-size is αh,
the equivalent conductance of the companion model is constant
as long as the basis time step-size h is a constant.
The local truncation error (LTE) measures how closely a numerical integration formula approximates the differential operator. We can prove the following result.
Theorem 1: The LTE ε of the mixed trapezoid FE formula
(5) with time step-size αh is given by
ε=
A. FLC Integration by Explicit-Formula Substitution
2
2
xn − xn−1 − (2α − 1)ẋn−1 .
h
h
1−
1
α
ẍξ
2
ẍ˙ξ
1.5
(αh)2 + 1 −
(αh)3
α
6
(6)
where tξ is between tn and tn−1 .
Proof: The proof is similar to the LTE estimation for the
standard trapezoid formula [19].
According to Theorem 1, when α = 1, the LTE of the mixed
trapezoid FE formula reduces to that of the standard trapezoid
formula. When α = 1/2, it represents the LTE of the BE formula using time step-size h/2. The mixed trapezoid FE formula
is a second-order integration formula only if α = 1 and degenerates to a first-order formula if α = 1.
In contrast to LTE, stability is a global property related to
the growth or decay of the local error introduced at each time
point and propagated to the following time points. “Absolute
stability” requires that |εn | < |εn−1 |. It is often studied with the
use of an RC test circuit as shown in [19, Fig. 5.1]. The stability
property of the mixed trapezoid FE formula can be proved
as below.
Theorem 2: The absolute stability region of the mixed trapezoid FE formula (5) with time step-size αh is defined by
1 + (2α − 1)z <1
(7)
1−z
1090
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
size hn = αh, a small α will unfortunately result in a
large LTE. Therefore, there exists a tradeoff between the
stability and the LTE.
Due to the mentioned LTE and stability problems, FLC integration formulae derived from explicit-formula substitution are
not suggested for analog circuit simulation with high accuracy
requirements. However, they can be used to enhance timing
analysis of digital circuits, for example, TETA [1].
B. FLC Integration by Iterative-Formula Transformation
The LTE and stability problems of the mixed trapezoid
FE formula come from the replacement of xn in the third term
of (3) by an approximate xn defined using the explicit FE
formula (4). In this subsection, rather than using explicit integration formulae, xn in the third term of (3) is replaced by the
(k−1)
at the present time point and
(k − 1)th iteration solution xn
(k)
a new kth iteration solution xn is obtained by solving (3),
where k is the iteration number. This leads to the iterativeformula transformation of (3), called the iterative trapezoid formula, written as
2 (k) 2
2α − 2 (k−1)
xn
− xn−1 − ẋn−1
ẋ(k)
n = xn − xn−1 −
h
h
αh
(k−1)
2
2 (k−1)
xn
= x(k)
+2
n − xn
h
h
− xn−1
− ẋn−1 .
αh
(k)
Fig. 2. Absolute stability regions of the mixed trapezoid FE formula for
(a) α = 0.625 and (b) α = 2.5.
where z = −h/(2τ ) and τ is the time constant of the RC test
circuit.
Proof: The proof is similar to the stability analysis for the
standard trapezoid formula [19].
The absolute stability regions for α = 0.625 and α = 2.5 are
shown in Fig. 2(a) and (b), respectively. From Theorem 2, two
observations can be made on the stability of the mixed trapezoid
FE formula.
• When α > 1, the absolute stability region moves closer
to that of the FE formula. The mixed trapezoid FE formula is not A-stable [19] or stiff stable [9], [19], and
cannot be used as a variable time step-size control scheme
when α > 1.
• When α < 1, the absolute stability region includes the
open left half plane of the complex z-plane and the mixed
trapezoid FE formula is A-stable. When α approaches 1/2,
the absolute stability region approaches that of the BE
formula. Further, the smaller α, the better the stability.
However, according to Theorem 1, for a fixed time step-
(8)
(k−1)
|
The final solution is said to be converged if |xn − xn
is less than a predefined error tolerance. If the iterative trapezoid formula (8) converges, its LTE will approach that of the
standard trapezoid formula.
Next, we characterize both convergence and stability properties of the iterative integration formula. To study the convergence property, let us write the linear(ized) circuit equation as
used in [22] as
Gx + C ẋ = b
(9)
where G and C represent the conductance and capacitance
(susceptance) matrices, and b is the vector due to input sources
and nonlinear devices. Replacing first-order time derivatives by
the iterative trapezoid formula (8), we have
G+
=
2C
h
1
1−
α
x(k)
n
2C (k−1) 2C
x
xn−1 +C ẋn−1 +b. (10)
+
h n
αh
Clearly, the iterative trapezoid formula converges if
−1 2C
1 2C 1−
<1
G+
h
α
h (11)
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
1091
Fig. 3. Convergence region of the iterative trapezoid formula for α =
0.625 and 2.5.
where • represents the spectral radius of the iteration matrix.
The above (11) can be rewritten as
1 − α1 1−z <1
(12)
where z = −h/(2τ ) and τ is an eigenvalue of the matrix
G−1 C. τ represents the time constant of the RC test circuit.
With this, we can show the following generalized convergence property.
Theorem 3: The convergence region of the iterative trapezoid formula (8) with a time step-size αh is defined by (12).
From (12), to ensure that the iterative trapezoid formula
converges for any decaying or stable oscillatory system
(Re(z) ≤ 0), i.e., to have the convergence region include all of
the left half of the complex z-plane, we must choose α > 0.5.
In our implementation, to speed up the convergence, 0.625 <
α < 2.5 is used in our experiments. The convergence region for
α = 0.625 and 2.5 is shown in Fig. 3. It represents the worstcase convergence region for 0.625 < α < 2.5. In practice, a
maximum iteration number limit for each iteration step is set.
In case that the iteration number exceeds the maximum limit
(due to either slow convergence or nonconvergence), the solution process with the same time step-size will be attempted one
more time with the standard trapezoid formula before the time
step-size is decreased.
Theoretically, an iterative implicit integration formula shall
have the same stability as the corresponding original implicit
formula if the iterative implicit integration formula is solved
exactly (iterated to infinity). However, in practice, the iterative
implicit integration formula is terminated either when the iteration number exceeds a predefined maximum limit or when the
convergence criteria is met with the predefined error tolerance.
In such case, the stability of the iterative formula can deteriorate. The stability of the iterative trapezoid formula (8) can be
characterized by the following theorem.
Fig. 4. Absolute stability regions of the iterative trapezoid formula with k = 2
for (a) α = 0.625 and (b) α = 2.5.
Theorem 4: The absolute stability region of the iterative
(0)
trapezoid integration formula (8) starting with xn = xn−1
with a time step-size αh is defined by
1 − 1 k 2z 1
α
α + z
(13)
+
<1
1
1−z
z − α1
α −z
where z = −h/(2τ ) and τ is the time constant of the RC test
circuit.
Proof: The proof is given in the Appendix.
The absolute stability regions for α = 0.625 and α = 2.5
with k = 2 are shown in Fig. 4(a) and (b), respectively, which
can be proven to satisfy the “stiff stability” requirements suggested by Gear [9]. For a fixed iteration number k, the absolute
stability region of the iterative trapezoid formula will approach
that of the standard trapezoid integration formula with α → 1.
Furthermore, we give the following stability property of the
iterative trapezoid formula.
Theorem 5: When k → +∞, the absolute stability region of
the iterative trapezoid formula (8) includes the entire open left
1092
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
half of the complex z-plane and excludes the entire right half of
the complex z-plane.
Theorem 5 can be interpreted by noting the following
fact: (8) is mathematically equivalent to applying the standard
trapezoidal method with basis time step-size h to the differential equation, and then using a quasi-Newton method [6], [29]
to solve the resulting equation. Therefore, when (8) is solved
exactly (k → +∞), the absolute stability region of the iterative trapezoid formula will be the same as that of the standard trapezoid formula. This can also be verified by setting
k → +∞ in (13).
C. FLC Integration by Predictor-Corrector Scheme
In SILCA, we first apply (5) and then apply (8) in a way
similar to the classical predictor and corrector procedure [9].
Noting how (5) is derived, we can see that applying (5) as
a predictor and (8) as an iterative corrector with k iterations
is mathematically equivalent to applying an explicit predictor
(the FE formula in this case) as a predictor and (8) as an iterative
corrector with k + 1 iterations.
Using (5) to predict an initial guess for the iterative trapezoidal formula (8) can lead to faster convergence than using the
previous time-point value as the initial guess for (8). Very often,
we may choose to carry (8) for one or a finite number of iterations, and then use the LTE to adjust time step-sizes, similar to
what is done in the classical predictor and corrector procedure.
In this case, the predictor–corrector use leads to the stability
region worse than that of applying only (8). We can prove the
following result.
Theorem 6: The absolute stability region of applying (5) as a
predictor and (8) as an iterative corrector with iteration number
k is defined by
1 − 1 k+1 2αz 2 1
α
α + z
(14)
+
<1
1
1−z
z − α1
α −z
where z = −h/(2τ ) and τ is the time constant of the RC test
circuit.
Proof: The proof is given in the Appendix.
The absolute stability regions for α = 0.625 and 2.5 with
k = 1 are shown in Fig. 5(a) and (b). Compared to Fig. 4, it can
be seen that when α = 0.625, the absolute stability region of the
predictor–corrector scheme is larger than that of applying (8)
alone. However, when α = 2.5, the predictor–corrector scheme
becomes less stable than applying (8) alone, and it is even no
longer stiff stable. Therefore, in SILCA, to ensure stability, (5)
is applied as a predictor only if α < 1.
D. Illustration of Basis Time Step-Size (h) Control
As discussed in Section II-B, to satisfy the convergence
property defined by Theorem 3, 0.5 < α < +∞ is required.
The limited α range means that it is impossible to use only one
single basis time step-size during transient simulation in our
framework. When hn /h is out of the α range, a new basis time
step-size has to be chosen (i.e., the present time step-size hn ),
which means the circuit matrix has to be updated and a new
Fig. 5. Absolute stability regions of the predictor–corrector scheme with
k = 1 for (a) α = 0.625 and (b) α = 2.5.
LU factorization is required. In this sense, a large α range is
preferred to decrease the total number of LU factorizations.
However, according to Theorem 3, the linear convergence rate
of the iterative trapezoid formula is related to |1 − 1/α| and a
smaller |1 − 1/α| means a faster convergence rate. Obviously,
a small α range is ideal to reduce the total number of iterations.
Therefore, in practice, 0.625 < α < 2.5 (|1 − 1/α| < 0.6) is
chosen to achieve a balance between the number of LU factorizations and the number of iterations, which will reduce the error to less than 5% of the original error after six iteration steps.
Considering that SPICE3 needs at least two iteration steps to
converge at a time point, the number of iteration steps with
SILCA is approximately 3× over that with SPICE3 for general
circuits. The detailed basis time step-size control scheme will
be described in Algorithm I of Section IV and a linear circuit
example is shown in Section V to illustrate the efficiency and
validity of the iterative trapezoid formula.
It should be noted that SILCA could be combined with fixed
time step-size methods to enlarge the range of α. As shown in
the Appendix, when α > 1, the first iteration with the iterative
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
Fig. 6.
1093
Linear RCL circuit example.
trapezoid formula is equivalent to applying the standard trapezoid formula with the basis time step-size h. Then, one idea
is to apply the standard trapezoid formula with the basis time
step-size h for multiple iterations if α is large. For example,
if α = 3.5, we could apply the standard trapezoid formula
with the basis time step-size h for the first two iterations and
then the iterative trapezoid formula for the rest iterations with
α = 3.5 − 1 = 2.5. By this way, the convergence and stability
properties will not be affected since the range of α for the
iterative trapezoid formula is kept unchanged (α = 2.5 for the
previous example). The extra cost is that more iterations will be
required with more step-sizes performed by a fixed time stepsize method.
E. Illustration of Stability Control
As discussed in Section II-B, the iterative trapezoid formula
satisfies the stiff stability [9], [19] and is applicable to stiff
circuits [19], [34], such as RC circuits, as long as circuit poles
are not so close to the imaginary axis of the complex z-plane.
For oscillatory circuits with poles close to the imaginary axis
in the complex z-plane, according to Theorem 3, the absolute
stability region of the iterative trapezoid formula with a finite
iteration number k will become worse than that of the standard
trapezoid formula. This can be illustrated using the linear RCL
circuit example in Fig. 6.
The transfer function of Vout for the RCL circuit shown in
Fig. 6 can be written as
H(s) =
1
Vout
.
=
2
3
Vin
RC Ls + CLs2 + 2RCs + 1
(15)
There are three poles for Vout2 — −0.5689 and −0.2151 ±
j1.3071, and three poles for Vout1 — −999999 and −0.5 ±
j1000. Noting that z = (hλ)/2, among these six poles,
−0.5689, −999999, and −0.2151 ± j1.3071 are on or close to
the negative real axis of the complex z-plane, therefore they
will not cause stability problems since the iterative trapezoid
formula has the stiff stability property. However, the rest of the
two poles −0.5 ± j1000 are far away from the negative real
axis and close to the imaginary axis of the complex z-plane,
which may not be covered by the absolute stability region when
convergence is achieved in a small iteration number k (i.e., the
blank region in the left half of the complex z-plane as shown
in Fig. 4(a) and (b) for k = 2). The simulation results with
SPICE3 and SILCA (without the stability control) are shown
in Fig. 7. Unstable simulation results are observed with SILCA
and the number of iteration steps with SILCA is 2× of that
with SPICE3. It should be noted that BDF [19] with the order
Fig. 7. Time-domain output waveform of Vout1 for a linear RCL circuit
example.
larger than two have the same stability problem for oscillatory
circuits.
This can be explained by comparing Figs. 3 and 4, which
show that the stability region is smaller than the convergence
region. In other words, the stability region might not cover
all circuit poles upon convergence if the stability requirement
(Theorem 4 or Theorem 6) is stricter than the convergence
requirement (i.e., user-specified error tolerance for convergence
justification). A tighter error tolerance for convergence will help
alleviate the stability problem. However, more iterations and/or
more time points have to be simulated. Therefore, SILCA is not
recommended for highly oscillatory circuits.
III. SVC M ETHOD
SPICE-like circuit simulators use the Newton–Raphson
method to solve a set of nonlinear equations. Typically, for each
Newton–Raphson iteration, a new LU factorization is required.
This can be extremely costly for a circuit with strong parasitic
coupling effects or with reduced dense linear networks. The
successive chord method [19] always uses a fixed chord as the
first-order derivative during nonlinear iteration. Hence, at each
time point, only one LU factorization is needed for nonlinear
iteration. But it is often hard to choose a single fixed chord for a
(strongly) nonlinear curve to always ensure a good convergence
rate. In general, a chord that ensures global convergence will
unfortunately lead to a slow convergence rate.
To achieve a good balance between the number of LU factorizations and that of iterations, we propose the SVC method.
The basic idea is to divide a nonlinear curve into different
1094
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
Fig. 9.
Fig. 8. PWNL example implemented with the SVC method.
segments, each of which represents a weakly nonlinear curve
and the same (local) chord is used for the same segment
during nonlinear iteration-so-called PWNL analysis. As shown
in Fig. 8, the nonlinear curve is divided into three PWNL
segments with three local chords, each of which represents the
maximum derivative for the corresponding segment. A new LU
factorization is performed only if the nonlinear curve enters
a different PWNL segment, where a new local chord is used.
By this method, similar convergence speed and accuracy can
be achieved as the Newton–Raphson method while the number
of LU factorizations can be decreased. We emphasize that the
PWNL model of a nonlinear device is used only for the calculation of first-order derivatives while the nonlinear function is
still evaluated using the original nonlinear device model.
The PWNL idea implemented with the SVC method can be
very effective due to the following facts. 1) Since MOSFETs
in analog applications generally operate linearly around their
operating points, only weakly nonlinear properties may be
present. A fixed chord representing the gm , gmbs , and gds of
MOSFETs at operating points is generally sufficient. A linearcentric harmonic balance analysis method has been proposed
in [15]. 2) MOSFETs in digital applications reside in two
regions at most time points—cutoff region and well-conducted
linear region with a very small source-to-drain voltage, both
regions have a relatively constant gm , gmbs , and gds . The only
situation where gm , gmbs , and gds change a lot is the time when
MOSFETs switch from the cutoff region through the saturation region to the linear region (or vice versa). This process
only occupies a small fraction of the total simulation time
for a MOSFET in a large-scale digital circuit. Hence, a fixed
chord for these situations will not significantly affect the total
iteration process.
With the above considerations, five MOSFET PWNL operating regions for digital circuit applications are defined as shown
in Fig. 9, and gm , gmbs , and gds for different operating regions
are listed in Table I. In Table I, Reg#0 represents the cutoff
PWNL operating regions of MOSFETs for digital applications.
TABLE I
gm , gmbs , AND gds FOR DIFFERENT MOSFET PWNL REGIONS
region, Reg#1 and Reg#3 are saturation regions, and Reg#2
and Reg#4 are linear regions. gm−max and gmbs−max are the
maximum values in all the regions (defined by Vdd ), and gds−i
is defined (generally the maximum values) for different regions
to ensure convergence. It should be noted that, theoretically, the
convergence rate of the SVC method is linear, but in practice it
can be maintained close to that of the Newton–Raphson method
by using more PWNL regions if needed.
Another advantage of the SVC method is that chords can
be precalculated and stored before simulation, no derivative
calculation is required during nonlinear iteration as in the
Newton–Raphson method. This can lead to a significant saving
in device loading time. Furthermore, table lookup models can
be easily implemented in SILCA than in SPICE since there is
no need of lookup tables for first-order derivatives.
The proposed SVC method is more accurate and effective
than ad hoc device bypass techniques utilized in modern circuit
simulators [14], [21], where device evaluations are bypassed
when terminal voltages of a nonlinear device are kept almost
constant for a few continuous nonlinear iteration steps. It has
been reported that the voltage/current range of device bypass
has to be kept small enough to avoid incorrect simulation results [21]. However, for high-frequency deep submicron circuit
applications, the efficiency is limited, since terminal voltages
of most nonlinear devices are not completely constant but are
changing slowly. The SVC method defines PWNL segments
and local chords based on the behaviors of specific nonlinear
devices under study; therefore, it can keep the circuit matrix
constant for a larger voltage/current range and requires much
less LU factorizations.
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
By the above MOSFET PWNL operating region definition,
only five sets of gm , gmbs , and gds are used during time-domain
simulation for digital systems. We further have the following
observations. 1) At one time point, most MOSFETs in a large
digital system will stay in their PWNL operating regions as
defined above while only a few may switch from one region
to another region. 2) For a switching MOSFET, the update of
gm , gmbs , and gds is regionwise. In other words, the change of
gm , gmbs , and gds from Reg#i to Reg#j is fixed. Therefore,
in the case that a small amount of MOSFETs change their
PWNL operating regions, we can compute the new L and U
matrices directly from the old L and U matrices using the lowrank update technique [7], [31] rather than performing costly
LU factorization for the entire circuit matrix.
Suppose that the previous circuit matrix is Y and one
MOSFET is now switching from Reg#1 to Reg#2. The new
circuit matrix for the next iteration can be expressed by
Y = Y + crT
Ylin
Y =
T
Ycoup
Ycoup
Ynon
Ynon

⊕
×
× ×



= ⊗
⊗
⊕.


×
× ×
⊕ × ⊕ × ⊕
(17)
⊗
TABLE II
ALGORITHM FOR SILCA TIME-DOMAIN SIMULATION
(16)
where c and r are sparse column vectors representing values
of updated elements. In thiscase, c = r = [0, . . . , 0, e,
0, . . . , 0, −e, 0, . . .]T , and e = |gds−2 − gds−1 |. Noting that
there are only four different elements between the matrix Y
and Y , the new L and U matrices for Y can be updated from
the previous ones for Y efficiently with the low-rank update
technique. The worst-case cost of m low-rank updates for a
dense matrix is O(m∗ n2 ), where m is the number of updated
elements and n is the matrix size. If m is much less than n,
the low-rank update will perform much faster than a regular
LU factorization, whose worst-case cost is O(n3 ) for a dense
matrix. With the introduced MOSFET PWNL definition, m
will be kept small enough at a time point since the number of
MOSFETs, whose terminal voltages change so violently that
the operating region is switched, is generally small.
Furthermore, the low-rank update cost can be decreased dramatically by exploiting sparse matrix techniques [7], [31] and
nonlinear/linear circuit partitioning to place matrix elements
due to nonlinear devices at the bottom-right corner of a circuit
matrix [7]. By this way, only matrix elements whose values
need to be updated are recomputed while all other matrix
elements are kept the same as before, i.e.,

1095
⊗
For example, the circuit matrix Y in (17) is partitioned into
the linear part Ylin , the nonlinear/linear coupling part Ycoup , and
the nonlinear part Ynon . Whenever a nonlinear device changes
its operating region (affecting four matrix elements of Ynon in
this example, marked by ⊗), Ylin and Ycoup are kept the same.
For the sparse matrix Ynon , only nine matrix elements need to
be updated (marked by ⊗ and ⊕) and the other eight are kept
unchanged (marked by ×). Therefore, the matrix sparsity can
be fully exploited by the low-rank update technique.
SILCA utilizes the sparse matrix solver package SPARSE1.3
[12], and a sparse low-rank update algorithm has been implemented successfully in SPARSE1.3. In practice, if the value
of a diagonal element (Lii ) during low-rank updates becomes
smaller than the predefined threshold value, the diagonal element will not be suitable for the following steps. In this case, a
regular LU factorization is restored.
IV. SILCA A LGORITHM
The basic algorithm for SILCA time-domain simulation is
shown in Table II. Practical considerations, such as processing
breakpoints [21], are not included for clarity. A new LU factorization is only required if the standard implicit integration
scheme is used. In case that only local chords of nonlinear
devices change, low-rank update is performed for fast LU
factorization. No LU factorization is needed in any other case.
Nonlinear capacitors can be handled in SILCA by combining
the proposed iterative trapezoid integration formula and the
proposed SVC, illustrated as
2 (k)
Qn − Qn−1 − Q̇n−1
hn
2 (k−1)
≈
+ Cn(k−1) Vn(k) − Vn(k−1) − Qn−1
Qn
hn
− Q̇n−1
Q̇(k)
n =
(k−1)
(k−1)
2Cn
2Cn
Vn(k) −
Vn(k−1)
hn
hn
2 (k−1)
Qn
+
− Qn−1 − Q̇n−1
hn
2C (k) 2C (k−1)
V
V
≈
−
h n
h n
2
Q(k−1)
+
− Qn−1 − Q̇n−1 .
n
hn
=
(18)
1096
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
Fig. 10. Linear RCL circuit example.
TABLE III
SIMULATION RESULTS OF A LINEAR CIRCUIT EXAMPLE
In the above derivation, a linearized capacitance C is introduced to represent the PWNL definition of the nonlinear
capacitor. The basis step-size h is used as previously. For
clarity, the nonlinear charge is assumed to be the function of a
single voltage. In practical MOSFET models, nonlinear charges
are generally affected by three voltages Vgs , Vds , and Vbs .
For example, the nonlinear charge between the drain and the
source of a MOSFET is Qds = Q(Vgs , Vds , Vbs ). Suppose that
both linearized capacitors of Qds (Cds , Cm , and Cmbs ) and
linearized conductors of Ids (gds , gm , and gmbs ) in a MOSFET
need to be updated due to the switch of PWNL regions, the
contribution of the MOSFET to the circuit matrix is as follows:
D
G
S
B
D ∆Gds ∆Gm −∆Gds − ∆Gm − ∆Gmbs ∆Gmbs
S −∆Gds −∆Gm ∆Gds + ∆Gm + ∆Gmbs −∆Gmbs
∆G
a
∆G
∆G
|a|
ds
m
mbs
−
=
− |a|
|a|
|a|
|a|
|a|
2∆Cds
2∆Cgs
+ ∆gds , ∆Gm =
+ ∆gm ,
h
h
2∆Cbs
+ ∆gmbs , a = ∆Gds + ∆Gm +∆Gmbs .
∆Gmbs =
h
(19)
∆Gds =
There are a total of eight matrix entries to be updated.
With the above representation, the rank-one update algorithm
[7], [31] can be used to realize fast LU factorization. In case that
more matrix entries (at most 16 for a MOSFET) are affected by
the switch of PWNL regions, a series of rank-one or rank-m
updates [3] are required to perform fast LU factorization. In
this case, the efficiency of low-rank updates may be reduced.
V. E XPERIMENTAL R ESULTS
Four sets of experiments are reported to demonstrate the
validity and efficiency of the introduced linear-centric techniques. The first test uses a simple linear RLC circuit to demonstrate the proposed predictor–corrector integration scheme. The
second test uses a variety of analog, digital, and RF circuits
with relatively small sizes to evaluate the effectiveness of the
SVC method implemented with the low-rank update technique.
The last two examples are circuits coupled with substrate and
power/ground networks, which are used to demonstrate the
scalability of SILCA on larger circuits, where a substantial
portion of the circuits are linear parasitic devices. The level 1
model of MOSFETs is implemented with the proposed PWNL
idea in SILCA, and nonlinear capacitors in MOSFETs are
Fig. 11.
Histogram of the number of iterations for a linear circuit example.
simplified as linear ones in both SILCA and SPICE3. To make
a fair evaluation of the benefits of the proposed linear-centric
techniques, no table lookup models of MOSFETs are used and
no RC(L) model order reduction algorithm is utilized.
A. Evaluation of Predictor–Corrector Integration Scheme
The efficiency of the predictor–corrector integration scheme
can be illustrated with the simple linear circuit example shown
in Fig. 10. It includes two RCL circuits with different time
constants. The input is a pulse signal (initially in the low voltage
level 0v) with 50% duty ratio and 80-s period. The simulation
length is set to 160 s. Since the minimum time constant is 0.01 s
for the left half RCL circuit, at least 16 000 time points are
required for a fixed time step-size simulation.
The simulation results are shown in Table III, where #Total
points represents the number of total simulated time points
and #Accepted points represents the number of accepted time
points. The rejected time points are those violating the LTE
requirement or exceeding the maximum iteration limit. Since
in SILCA a similar adaptive time step control scheme as that
in SPICE3 is applied based on the LTE requirement, it can be
seen from Table III that SILCA and SPICE3 achieve similar
#Total points and #Accepted points, which are much less than
that required by a fixed time step-size method. Furthermore,
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
1097
Fig. 14. Histogram of basis time step-sizes for a linear circuit example.
Fig. 12. Distribution of actual time step-sizes for a linear circuit example.
TABLE IV
SIMULATION RESULTS OF NONLINEAR TEST CIRCUITS∗
Fig. 13. Distribution of basis time step-sizes for a linear circuit example.
the number of LU factorizations used by SILCA decreases to
1.14% of that of SPICE3 (or 87.63× LU factorization cost
saving). The number of iterations increases to about 2.5×.
Fig. 11 shows the histogram of the number of iteration steps,
in which it can be seen that most of the iterations converge in
two to six steps.
Fig. 12 shows the distribution of actual time step-sizes
(hn = αh) during SILCA simulation. It can be seen that most
simulated time step-sizes are between 0.05 and 0.2 s, centering
around 0.08 s. Recall that since we choose 0.625 < α < 2.5,
it is possible that fewer basis time step-sizes are required. This
is confirmed by Fig. 13, which shows the distribution of basis
time step-sizes (h) during SILCA simulation. It can be seen
that most basis time step-sizes are the same and near 0.08 s.
In SILCA, it is the basis time step-size that is used for circuit
matrix construction. Therefore, SILCA keeps the circuit matrix
constant as long as the basis time step-size is constant.
The histogram of basis time step-sizes with SILCA is shown
in Fig. 14. Compared to Fig. 13, it can be concluded that most
basis time step-sizes are near 0.08 s and constant during the
following time intervals: 10–40, 45–80, 80–120, and 120–160 s.
∗ For
each circuit, the first row is the SPICE3 result and the second row is the
SILCA result.
It should be noted that SILCA is mainly designed for speeding up circuit simulation in case that most of the time stepsizes hn are close to the basis time step-size h, i.e., 0.625 <
(hn /h) < 2.5 in our experiments. In general, for transient
simulation of parasitic-sensitive circuits, most of time stepsizes are close to the basis time step-size for a relatively long
time interval when the transient behavior of circuits does not
change significantly, i.e., staying either in the logical “0” state
or logical “1” state. In case that time step-sizes hn change
violently, a new basis time step-size h will be chosen. However,
based on our experiences, such chances are only few (i.e., near
break points). Further, SILCA can be combined with fixed time
step-size methods to enlarge the range of (hn /h), as discussed
in Section II-D.
B. Evaluation of SVC and Low-Rank Update
To illustrate the efficiency of the SVC method and low-rank
update techniques, simulations on several analog, digital, and
RF circuits have been performed, and the results are shown in
Table IV. It can be seen that the number of iterations generally
1098
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
Fig. 15. Substrate coupling example.
increases to 1.5–2.5× of that with SPICE3. But the number of
LU factorizations used by the SVC method with low-rank update decreases to 3%–20% of that used by SPICE3. We can see
more saving in LU factorization with low-rank update for larger
circuits, such as a 20-stage inverter chain, a ring oscillator,
and a voltage-controlled oscillator (VCO). In general, a lowrank update technique will be more efficient for simulating a
nonlinear circuit with a large-scale (potentially dense) network
of linear elements since only the L and U matrices for the sparse
nonlinear part need to be updated during nonlinear iteration and
the dense linear part remains unchanged.
It should be pointed out that although the number of LU
factorizations is reduced dramatically with the SVC method and
low-rank update techniques, the speedup for circuits in Table IV
is not dramatic since the simulation cost is dominated by device
evaluation. As a relaxed direct method, SILCA has to take
more device evaluations than SPICE3 since more iteration steps
are required. For the Opamp follower example (including 32
MOSFETs, eight capacitors, and four current sources), SPICE3
runs for 17.87 s, in which 12.06 s is spent on device loading,
while SILCA requires 18.65 s with 13.89 s on device loading.
In this case, SILCA is more costly than SPICE3 since the
simulation time is dominated by device loading. Therefore,
SILCA is more suitable for parasitic-coupled VLSI circuits,
where the number of linear parasitic elements dominates the
number of nonlinear devices.
C. Coupled Circuit and Substrate Analysis
The third example is a simple substrate network, as shown
in Fig. 15, coupled with two inverters with pulse inputs in
different operating frequencies—the first inverter operates at a
low frequency and the second inverter operates at a high frequency. The bulk contacts of nMOSFETs are directly connected
to P-substrate ports and those of pMOSFETs are connected
to P-substrate ports through a capacitor between the N-well
and the P-substrate [27]. There are four other P-substrate ports
connecting to the ground, and the backplane of the substrate
is also connected to the ground. RCL loads are added at the
Fig. 16.
Transient waveform of Vout1 for the substrate coupling example.
output of each inverter (not shown in Fig. 15). The substrate is
modeled as a network consisting of a three-dimensional (3-D)
dense resistor mesh with multiple layers [32]. In Fig. 15, a onelayer resistor network is illustrated to model the substrate part
among four inverter bulk contacts.
Although simplified truncated substrate models have been
proposed to capture dominant coupling conductance [20], [27],
they are likely to underestimate coupling effects in circuit
systems designed to be noise immune [24]. Furthermore, the
accuracy with simplified substrate models may not be sufficient.
Therefore, accurate analysis of a circuit with a fully modeled
substrate is desirable for high fidelity circuit design and verification. Fig. 16 shows the time-domain output waveform of the
first inverter when the output signal is a digital “1” (the high
voltage level). First, the result from SILCA matches that from
SPICE3. Second, it can be seen that high-frequency feedthrough signals from the second inverter are present in Fig. 16.
This is an important first-pass design failure reason in deep
submicron digital and analog circuit designs, which may often
not be captured by simplified substrate analysis.
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
1099
TABLE V
SIMULATION RESULTS OF SUBSTRATE COUPLING EXAMPLES
Fig. 17. Runtime comparison of the substrate coupling example.
Fig. 18. Power/ground network coupling example.
Table V shows the statistics of running SILCA on a number
of substrate coupling examples with varying circuit substrate
network complexity compared to SPICE3. In our experiments,
the number of layers and the number of resistors per layer are
changed to vary the total number of circuit elements. A maximum 38.69× LU factorization cost saving and 17.30× overall
speedup (with about 35 000 elements) are achieved for this
simple substrate coupling analysis example, and the cost of forward/backward substitution is increased to 2.5–2.75×. No lowrank update technique is used for this example. The run time
comparison is shown in Fig. 17.
Fig. 19. Transient waveform of Vout for the power/ground network example.
Several observations are as follows. 1) The larger the circuit
is (therefore the larger LU/FBS cost ratio), the more overall
speedup can be achieved with SILCA. SILCA is very suitable
for deep submicron VLSI circuits with strong parasitic coupling
effects. 2) Device load cost with SILCA is decreased, which is
proportional to the LU factorization cost saving. The reason is
that in SILCA, device loads are only performed when circuit
matrix elements need to be updated due to nonlinear devices
and/or capacitors/inductors. For the substrate coupling examples, since most devices are resistors, their device loads are only
performed when a new LU is required. 3) The more savings
on LU factorization, the more iterations are required, which
means more cost on forward/backward substitution and device
evaluation. Therefore, there exists a tradeoff between the cost
of LU factorization and that of forward/backward substitution
and device evaluation. The maximum overall speedup will approach the LU factorization speedup for large strongly coupled
systems.
We also compare SILCA with a fast SPICE-like circuit simulator HSIM 1.3 [36], and the results are also collected in Fig. 17.
HSIM 1.3 uses the BE integration formula, a table lookup
MOS level 2 model, and device bypass techniques. Further,
HSIMSPEED = 1 is set in HSIM 1.3 so that the number of total
simulated time points is close to that of SPICE3 and SILCA
to achieve the same accuracy. It can be seen from Fig. 17 that
the larger the circuit is, the more speedup can be achieved with
SILCA. Note that SILCA does not use table lookup MOSFET
models.
1100
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
TABLE VI
SIMULATION RESULTS OF POWER/GROUND NETWORK COUPLING EXAMPLES
TABLE VII
SIMULATION RESULTS OF POWER/GROUND NETWORK COUPLING EXAMPLES WITH THE GMRES SOLVER (ε = 1e − 8)
D. Coupled Circuit and Power/Ground Network Analysis
The fourth example is a power/ground network as shown in
Fig. 18. The power and ground supply networks are modeled as
two RCL meshes (parasitic coupling capacitors are not shown
in Fig. 18). Between these two layers is a 20-stage inverter
chain, different inverters of which are connected to different
power/ground nodes. RCL loads are added to each inverter to
model interconnect lines between stages.
Fig. 19 shows the time-domain output waveform of the
inverter chain when the output signal is digital “1” (the high
voltage level). The “1” signal has been disturbed due to the IR
drop and L∗ dI/dt effects of the power/ground network (Vdd
is 3.3 V). Table VI shows the simulation results with varied
numbers of elements modeling the power/ground network. In
our experiments, the sizes of two RCL meshes are changed to
vary the number of elements. We can see that SILCA achieves
more speedup for larger circuits. The number of iterations
increases to 3.5–4.2× with SILCA. It is worthy noticing that
the maximum LU factorization cost saving and overall speedup
reach 88.50× and 14.00× (with about 60 000 elements),
respectively, with the rank-one update technique, which are
19.82× and 8.86×, respectively, with only the SVC method.
For comparison purposes, we have implemented a coupled iterative/direct solver for nonlinear circuits with largescale power/ground networks [17]. In this coupled solver,
power/ground networks are formulated with a nodal analysis
(NA) circuit matrix [19], which is symmetric positive definitive, and solved by the conjugate gradient method with an
incomplete Cholesky decomposition preconditioner [4]. Nonlinear circuits are formulated with an MNA circuit matrix and
solved by the direct method based on LU factorization and
Newton–Raphson iteration as in SPICE. The iterative method
and direct method are coupled together by a Gauss–Seidel
relaxation scheme [34]. Experimental results on the above
power/ground coupling examples show that the coupled iterative/director solver achieved similar speedup over SPICE3 as
SILCA. However, it should be noticed that the coupled iterative/direct solver is efficient only if there exists a good partition
with only a few boundary nodes between linear circuit parts and
nonlinear circuit parts and the coupling effects between those
two parts are weak.
Very recently, we developed a new GMRES solver with an
LU factorization preconditioning scheme [18] for time-domain
simulation of nonlinear circuits with large-scale power/ground
networks. The basic idea is to apply the same time step-size
controlling scheme as that used in SILCA. Whenever time stepsizes change violently, a new basis time step-size is chosen
and a regular LU factorization is performed. If time step-sizes
change in the range of 0.625 < α < 2.5, rather than using
linear-centric analysis methods in SILCA, a GMRES solver
is applied with the previous factorized L and U matrices as
the preconditioner. Meanwhile, to make a fair comparison with
SILCA, low-rank update has been applied to the preconditioning L and U matrices whenever a nonlinear device switches its
operating region. The GMRES solver is implemented following
the left-preconditioned GMRES algorithm in [26].
The simulation results with the new GMRES solver (ε =
1e − 8) are shown in Table VII. It is seen that the average
number of GMRES iterations ((#GMRES Iter)/(#GMRES))
with the LU factorization preconditioner is about 3–3.5 for a
GMRES solving process, which shows that the preconditioner
is very efficient. It is shown in Table VII that the speedup over
SPICE3 with the GMRES solver is less than that with SILCA.
The main reason is that the number of forward/backward
substitutions with the GMRES solver (#Precond in Table VII)
is generally larger than that with SILCA (#Iter in Table VI).
Furthermore, extra costs due to matrix–vector product operations have to be taken during the GMRES solving process. It
can be expected that the simulation cost will be increased if the
error tolerance of the GMRES solver is made tighter. It should
be noticed that the number of nonlinear iterations (#Tran Iter)
is less than that with SILCA since there is no FLC integration
scheme required for capacitors/inductors. However, the number
of nonlinear iterations is larger than that with SPICE3 due to the
PWNL definition of MOSFETs.
VI. C ONCLUSION
In this paper, a new nonlinear time-domain circuit simulation
method called SILCA has been proposed for deep submicron
VLSI circuit design and verification, which requires accurate
modeling of parasitic coupling effects. New variable time stepsize FLC numerical integration formulae are developed to
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
ensure constant equivalent conductance for capacitor/inductor
companion models. We have characterized convergence and
stability properties of the newly introduced integration formulae. As an alternative to the Newton–Raphson method,
an SVC method is proposed for nonlinear circuit simulation
and the low-rank update technique has been implemented for
efficient LU factorization. With these techniques, SILCA can
reduce dramatically the number of costly LU factorizations
for time-domain simulation. Experimental results on coupled
circuit, substrate, and power/ground network analysis have
demonstrated that SILCA can achieve SPICE-like accuracy
yet with orders of magnitude speedup over SPICE. Future
research includes handling of nonlinear capacitors, optimum
PWNL model generation for nonlinear device models, exploiting incomplete LU preconditioners [26] for GMRES, and
applications of SILCA to coupled electrical, electromagnetic,
and thermal simulation.
(k)
τ ẋ(k)
n + xn = 0
2τ (k) 2τ (k−1) 2τ (k−1)
xn −
xn
xn
+
− xn−1
h
h
αh
− τ ẋn−1 + x(k)
n =0
+ xn−1 + x(k)
n =0
(A.1)
Since the proof is for the stability property of the iterative
(0)
trapezoid formula, the initial guess xn of xn is the solution of
(0)
the previous time point xn = xn−1 . Then, the derivation can
be carried out as
1
+z
1 − α1
1+z
xn−1 + α
xn−1 =
xn−1
=
1−z
1−z
1−z
k=2


1 m−1
1− α
1
1 m−1
1+z 1− 1−z
 1− α
α +z 
+
x(m)
xn−1 .
1
n =
1−
1−z
1−z
1−z
1− α
According to (A.2), it is easy to check that the absolute stability condition cannot be satisfied if (1 − 1/α)/(1 − z) = 1.
Therefore, the absolute stability region of the iterative trapezoid
formula is then expressed by the inequality
1 m−1
1− α
1 − 1 m−1 1 + z
1
1 − 1−z
α
α + z
+
< 1. (A.4)
1− 1
1−z
1−z
1−z 1 − 1−zα
1
α
1
α
+ z < 1.
− z
(A.5)
This completes the proof of Theorem 4.
If the mixed trapezoid FE formula is applied as an integration
predictor when α < 1, the absolute stability region of the
iterative trapezoid formula is derived as
x(0)
n =
k=1
1 + (2α − 1)z
xn−1
1−z
1
1 − α1 (0)
+z
xn + α
xn−1
1−z
1−z
..
.
k=m
1
+z
1 − α1 (m−1)
xn
xn−1
+ α
1−z
1−z
m−1
1 − α1
1 + (2α − 1)z
=
1−z
1−z
m−2
1
1
1 − α1
+
z
+
z
α
+
+ ··· + α
xn−1 .
1−z
1−z
1−z
=
x(m)
n
1
+z
1 − α1 (1)
xn + α
xn−1
1−z
1−z
1
1 − α1 1 + z
+z
∗
+ α
=
xn−1
1−z 1−z
1−z
x(2)
n =
..
.
k=m
x(1)
n =
k=1
x(1)
n
Noting that the terms in the square bracket of (A.1) are a
geometric series except the first term, it can be written further
in the following format if (1 − 1/α)/(1 − z) = 1, i.e.,
Finally, we have the result
1 − 1 m 2z α
+
1−z
z − α1
2τ (k) 2τ (k−1) 2τ (k−1)
xn −
xn
xn
+
− xn−1
h
h
αh
x(k)
n
(A.2)
(A.3)
Proof: Applying the iterative trapezoid formula (8) to
an RC test example, the iterative relationship can be derived
(τ = RC), i.e.,
h
z=− .
2τ
1
+z
1 − α1 (m−1)
xn
xn−1
+ α
1−z
1−z
m−1
1 − α1
1+z
=
1−z
1−z
m−2
1
1
1 − α1
+
z
+
z
α
+ ··· + α
xn−1 .
+
1−z
1−z
1−z
=
x(m)
n
1−z
A PPENDIX
P ROOF OF T HEOREM 4 AND T HEOREM 6
1
+z
1 − α1 (k−1)
xn
xn−1 ,
=
+ α
1−z
1−z
k=m
1101
(A.6)
1102
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 6, JUNE 2006
Therefore, the absolute stability region of the iterative trapezoid formula is then expressed by the inequality
1 − 1 m+1 2αz 2 α
+
1−z
z − α1
1
α
1
α
+ z < 1.
− z
This completes the proof of Theorem 6.
(A.7)
ACKNOWLEDGMENT
The authors would like to thank Prof. K. Mayaram of
Oregon State University and Dr. J. Rockway of the Naval Space
and Warfare System Center, San Diego, for several helpful
discussions. The authors are also grateful to the anonymous
reviewers for their detailed and constructive comments that
greatly enhanced this paper.
R EFERENCES
[1] E. Acar, F. Dartu, and L. T. Pileggi, “TETA: Transistor-level waveform
evaluation for timing analysis,” IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst., vol. 21, no. 5, pp. 605–616, May 2002.
[2] U. M. Ascher and L. R. Petzold, Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. Philadelphia,
PA: SIAM, 1998.
[3] H. W. Buurman, “From circuit to signal-development of a piecewise
linear simulator,” Ph.D. dissertation, Dept. Elect. Eng., Eindhoven Univ.
Technol., Eindhoven, The Netherlands, Jan. 1993.
[4] T. Chen and C. C.-P. Chen, “Efficient large-scale power grid analysis based on preconditioned Krylov-subspace iterative methods,” in
Proc. IEEE/ACM Design Automation Conf., Las Vegas, NV, Jun. 2001,
pp. 559–562.
[5] P. F. Cox, R. G. Burch, P. Yang, and D. E. Hocevar, “New implicit integration method for efficient latency exploration in circuit simulation,” IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 8, no. 10, pp. 1051–
1064, Oct. 1989.
[6] J. E. Dennis and J. J. Moré, “Quasi-Newton methods, motivation and
theory,” SIAM Rev., vol. 19, no. 1, pp. 46–89, Jan. 1977.
[7] T. Fujisawa, E. S. Kuh, and T. Ohtsuki, “A sparse matrix method for analysis of piecewise-linear resistive networks,” IEEE Trans. Circuit Theory,
vol. CT-19, no. 6, pp. 571–584, Nov. 1972.
[8] K. Gala, V. Zolotov, R. Panda, B. Young, J. Wang, and D. Blaauw,
“On-chip inductance modeling and analysis,” in Proc. IEEE/ACM Design
Automation Conf., Los Angeles, CA, Jun. 2000, pp. 63–68.
[9] C. W. Gear, Numerical Initial Value Problems in Ordinary Differential
Equations. Upper Saddle River, NJ: Prentice-Hall, 1971.
[10] K. R. Jackson and R. Sacks-Davis, “An alternative implementation of
variable step-size multistep formulas for stiff ODEs,” ACM Trans. Math.
Softw., vol. 6, no. 3, pp. 295–318, Sep. 1980.
[11] S. Kapur and D. E. Long, “Large-scale capacitance calculation,” in
Proc. IEEE/ACM Design Automation Conf., Los Angeles, CA, Jun. 2000,
pp. 744–749.
[12] K. S. Kundert and A. Sangiovanni-Vincentelli, Sparse User’s Guide—
A Sparse Linear Equation Solver Version 1.3a. Berkeley: Univ.
California, Apr. 1988.
[13] P. M. Lee, S. Ito, T. Hashimoto, J. Sato, T. Touma, and
G. Yokomizo, “A parallel and accelerated circuit simulator with precise
accuracy,” in Proc. Int. Conf. VLSI Design, Bangalore, India, Jan. 2002,
pp. 213–218.
[14] E. Lelarasmee and A. Sangiovanni-Vincentelli, “RELAX: A new circuit
simulator for large scale MOS integrated circuits,” in Proc. IEEE/ACM
Design Automation Conf., Las Vegas, NV, Jun. 1982, pp. 682–690.
[15] P. Li and L. Pileggi, “A linear-centric modeling approach to harmonic
balance analysis,” in Proc. IEEE/ACM Design, Automation and Test Eur.
Conf., Paris, France, Mar. 2002, pp. 634–639.
[16] Z. Li and C.-J. R. Shi, “SILCA: Fast-yet-accurate time-domain simulation of VLSI circuits with strong parasitic coupling effects,” in Proc.
IEEE/ACM Int. Conf. Computer-Aided Design, San Jose, CA, Nov. 2003,
pp. 793–799.
[17] ——, “A coupled iterative/direct method for efficient time-domain simulation of nonlinear circuits with power/ground networks,” in Proc. IEEE
Int. Symp. Circuits and Systems, Vancouver, Canada, May 2004, vol. 5,
pp. 165–168.
[18] ——, “An efficiently preconditioned GMRES method for fast parasiticsensitive deep-submicron VLSI circuit simulation,” in Proc. IEEE/ACM
Design, Automation and Test Eur. Conf., Munich, Germany, Mar. 2005,
vol. 2, pp. 752–757.
[19] W. J. McCalla, Fundamentals of Computer-Aided Circuit Simulation.
Boston, MA: Kluwer, 1988.
[20] M. Nagata, J. Nagai, K. Hijikata, T. Morie, and A. Iwata, “Physical design
guides for substrate noise reduction in CMOS digital circuits,” IEEE J.
Solid-State Circuits, vol. 36, no. 3, pp. 539–549, Mar. 2001.
[21] L. W. Nagel, “SPICE: A computer program to simulate semiconductor circuits,” Univ. California, Berkeley, Tech. Rep. UCB/ERL M520,
May 1975.
[22] A. Odabasioglu, M. Celik, and L. T. Pillegi, “PRIMA: Passive reducedorder interconnect macromodeling algorithm,” IEEE Trans. Comput.Aided Des. Integr. Circuits Syst., vol. 17, no. 8, pp. 645–654,
Aug. 1998.
[23] L. R. Petzold, “A description of DASSL: A differential/algebraic system
solver,” in IMACS Transactions on Scientific Computing, vol. 1. Amsterdam, The Netherlands: North-Holland, R. Stepleman et al., Eds.1983,
pp. 65–68.
[24] J. R. Phillips and L. M. Silveira, “Simulation approaches for strongly
coupled interconnect systems,” in Proc. IEEE/ACM Int. Conf. ComputerAided Design, San Jose, CA, Nov. 2001, pp. 430–437.
[25] K. Radhakrishnan and A. C. Hindmarsh, “Description and use of LSODE,
the Livermore solver for ordinary differential equations,” Lawrence Livermore Nat. Lab., Livermore, CA, LLNL Tech. Rep. UCRL-ID-113855,
1993.
[26] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed. Philadelphia, PA: SIAM, 2003.
[27] A. Samavedam, A. Sadate, K. Mayaram, and T. S. Fiez, “A scalable
substrate noise coupling model for design of mixed-signal IC’s,” IEEE
J. Solid-State Circuits, vol. 35, no. 6, pp. 895–904, Jun. 2000.
[28] B. N. Sheehan, “TICER: Realizable reduction of extracted RC circuits,”
in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, San Jose, CA,
Nov. 1999, pp. 200–203.
[29] A. H. Sherman, “On Newton-iterative methods for the solution of
systems of nonlinear equations,” SIAM J. Numer. Anal., vol. 15, no. 4,
pp. 755–771, 1978.
[30] H. Su, K. H. Gala, and S. S. Sapatnekar, “Fast analysis and optimization
of power/ground networks,” in Proc. IEEE/ACM Int. Conf. ComputerAided Design, San Jose, CA, Nov. 2000, pp. 477–480.
[31] J. T. J. van Eijndhoven and M. T. van Stiphout, “Latency exploitation
in circuit simulation by sparse matrix techniques,” in Proc. IEEE Int.
Symp. Circuits and Systems, Espoo, Finland, Jun. 1988, pp. 623–626.
[32] N. K. Verghese, T. J. Schmerbeck, and D. J. Allstot, Simulation Techniques and Solutions for Mixed-Signal Coupling in Integrated Circuits.
Norwell, MA: Kluwer, 1995.
[33] Y. Wang, V. Jandhyala, and C.-J. R. Shi, “Coupled electromagnetic-circuit
simulation of arbitrarily-shaped conducting structures,” in Proc. IEEE
Conf. Electrical Performance Electronic Packaging, Cambridge, MA,
Oct. 2001, pp. 233–236.
[34] J. K. White and A. Sangiovanni-Vincentelli, Relaxation Techniques for
the Simulation of VLSI Circuits. Norwell, MA: Kluwer, 1987.
[35] M. Zwoliński and R. W. Allen, “Practical algorithms for fully decoupled
mixed-mode simulation of electronic circuits,” in Proc. IEEE Int. Symp.
Circuits and Systems, Sydney, Australia, May 2001, pp. 451–454.
[36] User Guide—HSIM Version 1.3, Nassda Corp., Santa Clara, CA,
Apr. 2001.
Zhao Li received the B.S. degree in electronics
and the M.S. degree in microelectronics and solidstate electronics from Tsinghua University, Beijing,
China, in 1998 and 2000, respectively, and the Ph.D.
degree in electrical engineering from the University
of Washington, Seattle, in 2005.
He is currently with Cadence Design Systems,
Inc., San Jose, CA. His research interests include
mixed-signal and deep submicron circuit simulation, symbolic analysis, behavioral modeling for
analog/RF circuit application, device modeling, and
optimization algorithms.
LI AND SHI: SILCA FOR EFFICIENT TIME-DOMAIN SIMULATION OF VLSI CIRCUITS WITH PARASITIC COUPLINGS
C.-J. Richard Shi (M’91–SM’99–F’06) received the
Ph.D. degree in computer science from the University of Waterloo, Waterloo, ON, Canada, in 1994.
From 1994 to 1998, he was with Analogy, Rockwell Semiconductor Systems, and the University of
Iowa. In 1998, he joined the University of Washington, Seattle, WA, where he is currently a Professor
in electrical engineering. His research interests include several aspects of the computer-aided design
and test of integrated circuits and systems, with
particular emphasis on analog/mixed-signal and deep
submicron circuit modeling, simulation, and design automation. He is a key
contributor to the IEEE Std. 1076.1-1999 (VHDL-AMS) standard for the description and simulation of mixed-signal circuits and systems. He founded the
IEEE International Workshop on Behavioral Modeling and Simulation (BMAS)
in 1997.
Dr. Shi was an Associate Editor, as well as a Guest Editor, of the IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS—II, ANALOG AND DIGITAL
SIGNAL PROCESSING. Since 1999, he has been the Associate Editor of
the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED
CIRCUITS AND SYSTEMS. He has received several awards for his research
including a Doctoral Prize from the Natural Science and Engineering Research
Council of Canada (1995), a Best Paper Award from the 1998 IEEE VLSI Test
Symposium, a Best Paper Award from the 1999 IEEE/ACM Design Automation
Conference, a National Science Foundation CAREER Award (2000), and an
SRC-TECHCON Best Paper Award (2003).
1103
Download