Statistical library characterization using belief propagation across multiple technology nodes Please share

advertisement
Statistical library characterization using belief
propagation across multiple technology nodes
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation
Li Yu, Sharad Saxena, Christopher Hess, Ibrahim (Abe) M.
Elfadel, Dimitri Antoniadis, and Duane Boning. 2015. Statistical
library characterization using belief propagation across multiple
technology nodes. In Proceedings of the 2015 Design,
Automation & Test in Europe Conference & Exhibition (DATE
'15). EDA Consortium, San Jose, CA, USA, 1383-1388.
As Published
http://dl.acm.org/citation.cfm?id=2757012.2757134
Publisher
Association for Computing Machinery (ACM)
Version
Author's final manuscript
Accessed
Wed May 25 19:18:52 EDT 2016
Citable Link
http://hdl.handle.net/1721.1/96913
Terms of Use
Creative Commons Attribution-Noncommercial-Share Alike
Detailed Terms
http://creativecommons.org/licenses/by-nc-sa/4.0/
Statistical Library Characterization Using Belief
Propagation across Multiple Technology Nodes
Li Yu, Sharad Saxena1 , Christopher Hess1 , Ibrahim (Abe) M. Elfadel2 , Dimitri Antoniadis, Duane Boning
Massachusetts Institute of Technology, 1 PDF Solutions, 2 Masdar Institute of Science and Technology
Email: yul09@mit.edu
Abstract—In this paper, we propose a novel flow to enable
computationally efficient statistical characterization of delay and
slew in standard cell libraries. The distinguishing feature of the
proposed method is the usage of a limited combination of output
capacitance, input slew rate and supply voltage for the extraction of
statistical timing metrics of an individual logic gate. The efficiency
of the proposed flow stems from the introduction of a novel,
ultra-compact, nonlinear, analytical timing model, having only four
universal regression parameters. This novel model facilitates the
use of maximum-a-posteriori belief propagation to learn the prior
parameter distribution for the parameters of the target technology
from past characterizations of library cells belonging to various
other technologies, including older ones. The framework then
utilises Bayesian inference to extract the new timing model parameters using an ultra-small set of additional timing measurements
from the target technology. The proposed method is validated and
benchmarked on several production-level cell libraries including
a state-of-the-art 14-nm technology node and a variation-aware,
compact transistor model. For the same accuracy as the conventional lookup-table approach, this new method achieves at least 15x
reduction in simulation runs.
I. I NTRODUCTION
A standard cell library capturing statistical information of
delay and output slew variations is at the core of statistical static
timing analysis (SSTA), and, cost efficient statistical characterization of such libraries has become essential. The most widely
used statistical library cell characterization method is based on
the look-up table (LUT) approach where gate propagation delay
(td ), output transition time (Sout ) and their variations are stored
in a look-up table with different combinations of inputs such
as cell types, input slew (Sin ), load capacitance (Cload ), supply
voltage (Vdd ), and other parameters [1].
The runtime complexity required for such a statistical LUTbased approach is O(Nsample · NLU T ), where Nsample is the
number of SPICE runs needed to obtain each mean and variance
value and NLU T is the number of input vector combinations.
This approach will quickly become infeasible as either NLU T
or Nsample in a technology increases. Historically, circuit level
Monte Carlo (MC) simulation has been employed to generate a number of samples in the process parameter probability
space [2]. Such approach allows variability-aware analysis to
be implemented with minor changes on top of existing characterization tools but requires a large number of MC runs. To
address this challenge, several approaches based on sensitivity
analysis for library characterization have been proposed by
EDA vendors. For instance, Composite Current Source (CSC) is
adopted by the Synopsys PrimeTime SSTA tool and sensitivitybased effective-current-source-model (S-ECSM) is adopted by
the Cadence statistical tool. All of these approaches aim at
modelling the statistical impact of process parameter variations
as a linear superposition of the impact of each parameter in the
response model of the affected metric. Several Response Surface
Methodologies (RSMs) have also been proposed to explore the
sparsity of the process regression coefficients. An example of
such a strategy is Least-Angle Regression (LAR) which uses
L1 -norm regularization [3]. One major benefit of regularizing
with the L1 -norm is that it results in sample complexity that
is logarithmic in the number of features (e.g., principal components). For statistical characterization of standard cells, an
error propagation technique using linear sensitivity analysis and
Response Surface Methodology (RSM) using Brussel Design of
Experiments (DoE) was proposed for library characterization
in [4]. The Brussel DoE performs statistical feature selection
keeping only those features that are most relevant to the response
under consideration. Then it uses a model selection algorithm
to build a suitable regression model for all the responses. More
Recently, several statistical circuit simulator based on uncertainty
quantification have been successfully applied to avoid the huge
number of repeated simulations in conventional Monte Carlo
flows [5]–[9].
On the other hand, the expensive simulation cost of the statistical LUT-based approach is not only due to high dimensionality
of the process space, but also due to high dimensionality of the
cell input space (e.g., cell type, input slew Sin , load capacitance,
supply voltage Vdd , etc.). This problem is further exacerbated as
more design options are provided in recent technologies (e.g.,
multi-Vt , multi-Vdd ). While most of the existing work focuses
on exploring the sparsity of the regression coefficients of the
process space with a reduced process sample size for each input
space vector, correlations between different cells and different
input vectors within the same cell have not been considered
in the open literature, to the best of our knowledge. This has
been the main motivation of this work which proposes a novel
acceleration method that operates in the library input space rather
than its process space and that can be added to any acceleration
used in the process space.
This is achieved through the systematic use of recent advances
in statistics and semiconductor metrology that we apply to the
development of computationally efficient statistical characterization algorithms for standard cell libraries. We propose two
key techniques to explore correlations in library input space.
The first is a novel ultra-compact, analytical model for gate
timing characterisation, and the second is a Bayesian learning
algorithm for the parameters of the aforementioned timing model
using past library characterizations along with a very small set
of additional simulations from the target technology. Bayesian
approaches were initially introduced in the area of VLSI design
for post-Silicon validation and parameter extraction [10]–[15].
The intrinsic simplicity of the proposed timing model combined
with the Bayesian learning [16] framework is capable of building
very accurate circuit response representations.
The rest of this paper is organized as follows. Section 2
introduces basic notation and formulates the problem of statistical characterization in library input space. Section 3 describes
prior work on gate delay modelling and presents our novel
ultra-compact analytical model for gate delay and slew. Section
4 presents our Bayesian algorithm which learns timing model
parameters from past library characterizations and a very small
set of additional simulation runs in the target technology. The
foundation of this algorithm is the use of maximum-a-posteriori
(MAP) estimation. In Section 5, our new methods are validated
on the library characterization in state-of-the-art 14-nm and
28-nm technology and compared with the LUT method. Our
conclusions are given in Section 6.
II. P ROBLEM F ORMULATION
In library characterization, an accurate model for cell delay
(Td ) and output slew (Sout ) is developed given the following
input data: a cell type, input slew (Sin ), output load capacitance
(Cload ), transition direction (RISE/FALL), and supply voltage
(Vdd ). To formalize the library characterization problem, we
consider an individual logic gate with multiple inputs and one
output, and for simplicity, we start from the standard assumption
that only one timing arc is modelled at a time, which implies
that we do not consider simultaneous input switching. For p input
variables (ξ = {ξ1 , ξ2 , ..., ξp }), such as Sin , Vdd , Cload , etc., the
cell response is modeled as the following two functions:
Td = fT (ξ1 , ξ2 , ..., ξp )
(1)
Sout = fS (ξ1 , ξ2 , ..., ξp )
(2)
The problem of nominal library characterisation is to estimate
fT and fS given k input vectors {ξ} = {ξ (1) , ξ (2) , ..., ξ (k) }
(1)
(2)
(k)
(1)
(2)
and k output observations {Td , Td , ..., Td } and {Sout , Sout ,
(k)
..., Sout }, such that the timing prediction error with respect4
to a baseline case is minimized under the condition that k is
very small. The nominal baseline case is defined by SPICE
simulations under n different input vectors (n >> k ) sampled
randomly within the input space ξ.
Let us now denote by {Td } an ensemble of delay observations.
This ensemble has been generated for a given input vector
but under varying process parameters. Now we formulate the
problem of statistical library characterisation in input space as
that of estimating fT and fS given k input vectors {ξ} =
{ξ (1) , ξ (2) , ..., ξ (k) } and k ensembles of output observations
(1)
(2)
(k)
(1)
(2)
(k)
{{Td }, {Td }, ..., {Td }} and {{Sout }, {Sout }, ..., {Sout }},
such that the prediction error for the statistical metrics with
respect to a statistical baseline case is minimized under the
condition that k is very small. The statistical baseline case
is defined by statistical SPICE simulations using the same n
different input vectors (n >> k) as the nominal baseline case,
where the SPICE simulations are now executed according to
the Monte Carlo method in process space. The metrics of the
statistical baseline case include the mean and standard deviation
of delay and output slew at each input vector i ∈ {1, ..., n}.
(i)
(i)
(i)
(i)
They are denoted as µTd , µSout and σTd , σSout (i = 1, 2, ..., n),
respectively.
III. M ODEL FOR DELAY AND OUTPUT SLEW
Accurate gate level modeling for delay and slew estimation
has become a major challenge for nanometric technologies.
Historically, the transistor delay has been simply approximated
by Cload Vdd /Idsat , where Idsat is the drain current at Vgs =
Vds = Vdd . A more accurate model, named the alpha-power law,
was later proposed in the early 1990s [17] where a closed-form
expression was derived for the delay of an inverter. A simplified
version of the alpha-power law was proposed in [18]. More
recently, a simple analytical expression for the intrinsic MOSFET
delay, using physics-based models for the effective current and
the total gate switching charge, was proposed to better describe
nanometric technologies [19], [20].
Although these advanced delay models provide accurate description of transition activity in the cell, they are still quite
complex, and detailed process information is required to fit the
entire model.
Our first goal therefore is to contribute an ultra compact timing
model that is at once a generalisation of older models but whose
parameters allow a sparse representation of input space vectors.
Fig. 1 (a) shows the key factors that affect the delay and output
slew of an inverter. In this work, we consider the impact of input
slew (Sin ), output load capacitance (Cload ), supply voltage (Vdd ),
and driving strength (Ief f ).
(b)
(a)
Fig. 1. (a) Key factors that affect the delay and output slew of an inverter; (b)
NAND2 equivalent inverter: The pull-up network is replaced with an “equivalent”
PMOS while the pull-down network is replaced with an “equivalent” NMOS
device.
To find our ultra compact model, we first study gate delay in a
simple inverter and generalize it to any combinatorial logic cell.
Recent studies [21]–[23] show that the simple Cload Vdd /Idsat
metric follows the experimental inverter delay much better if
the on-current in the denominator is replaced with an effective
current Ief f representing the average switching current. In line
with the intrinsic transistor delay defined in [19], we model cell
delay as
∆Q
Td = kd
(3)
Ief f
where kd is a scaling factor used to obtain a good fit to the actual
cell delays. Ief f is defined as
Ief f =
Id (Vgs = Vdd , Vds =
Vdd
2 )
+ Id (Vgs =
2
Vdd
2 , Vds
= Vdd )
(4)
and can be evaluated easily through performance modeling or
through a circuit simulation that takes into account process
variations [19], [24]. Since our focus is to model delay and output
slew as functions of input variables, (1) and (2), we assume
we know Ief f for each input vector. Note that the direct link
between process parameters and delay is still preserved in the
Ief f current. To generalize the above model to any combinatorial
logic cell, we simply next each gate onto an “equivalent inverter”
and use the inverter characterization to estimate delays and
output slews [25]–[27]. Fig. 1(b) shows the equivalent inverter of
a NAND2 where the pull-up network is replaced with a PMOS
and the pull-down network is replaced with an NMOS device.
The charge transferred to or from the load capacitance during
switching is equal to
∆Q = (Vdd + V 0 )(Cload + Cpar + αSin )
0
(5)
where Cpar , V and α are all fitting parameters. Compared
with the simple Cload Vdd /Idsat metric, several effects have
been considered: (1) Cpar is introduced to account for parasitic
capacitance, such as those associated with junctions and interconnects, which are not included in Cload ; (2) V 0 is introduced
to compensate for the inaccuracy of the delay model at low Vdd ;
and (3) a linear coefficient α is introduced to account for Sin ’s
impact on delay. The estimates of fT and fS are then converted
to parameter extraction problems for {kd , Cpar , V 0 , α}.
A special feature of this simple delay model is that the
same format is used to describe not only delay but also output
slew Sout albeit with a different set of values for the fitting
parameters {kd , Cpar , V 0 , α}. To validate the proposed model,
Td · Ief f /(Vdd + V 0 ) and Sout · Ief f /(Vdd + V 0 ) versus different
Vdd values are shown in Fig. 2, where Td and Sout are simulated
through SPICE using a 14-nm industrial design kit and two
separate V 0 values are extracted for Td and Sout . For different
groups of Cload and Sin combinations, a constant value of
Td · Ief f /(Vdd + V 0 ) and Sout · Ief f /(Vdd + V 0 ) is observed
under different Vdd .
8
x 10
-15
x 10
Delay L-H
Delay H-L
-14
Rise slew rate
Fall slew rate
0.9
7
0.8
Sout*Ieff/(Vdd+V')
Td*Ieff/(Vdd+V')
6
5
4
3
0.7
0.6
0.5
0.4
0.3
2
0.2
1
0.1
0
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
0.65
1
0.7
0.75
Vdd (V)
0.8
0.85
0.9
0.95
1
Vdd (V)
Fig. 2. For a NOR2 cell designed in a commercial state-of-the-art 14-nm
technology, a constant value of Td ·Ief f /(Vdd +V 0 ) and Sout ·Ief f /(Vdd +V 0 )
is observed versus different Vdd and RISE/FALL combinations.
4
Fig. 3 shows Td /(Cload + Cpar + αSin ) and Sout /(Cload +
Cpar + αSin ) versus different Cload and Sin combinations. A
similar result is observed here that for different Vdd and transition
(RISE/FALL) combinations, Td /(Cload + Cpar + αSin ) and
Sout /(Cload + Cpar + αSin ) are approximately constant.
12000
Delay L-H, Vdd=0.85V
Delay L-H, Vdd=1V
Delay H-L, Vdd=0.7V
Delay H-L, Vdd=0.85V
8000
Delay H-L, Vdd=1V
6000
4000
Rise slew rate, Vdd=0.7V
Fall slew rate, Vdd=0.7V
Fall slew rate, Vdd=0.85V
12000
Fall slew rate, Vdd=1V
10000
8000
pdf (µPT ) =
6000
4
C
load
6
8
10
12
14
& Input slew combination
0
2
4
6
8
10
12
14
Cload & Input slew combination
Fig. 3. For a NOR2 cell designed in a commercial state-of-the-art 14-nm
technology, a constant value of Td /(Cload +Cpar +αSin ) and Sout /(Cload +
5
Cpar
+ αSin ) is observed versus different Cload , Sin and RISE/FALL
combinations.
TABLE I
E XTRACTED PARAMETERS FOR DELAY MODEL FROM INV, NAND2 AND
NOR2 IN THREE DIFFERENT TECHNOLOGIES WITH THEIR FITTING ERROR .
Tech
A
A
A
B
B
B
C
C
C
Cell
INV
NAND2
NOR2
INV
NAND2
NOR2
INV
NAND2
NOR2
kd
0.389
0.372
0.356
0.416
0.403
0.374
0.389
0.383
0.368
Cpar (f F )
0.951
1.328
1.186
1.046
1.471
1.276
0.978
1.12
1.225
4π 2
4000
2000
2
1
1
p
· exp[− (µPT − PT )T Σ−1
PT (µPT − PT )]
2
|ΣPT |
(6)
where µPT and ΣPT are the mean vector and covariance matrix
of the parameter subgroup PT , respectively. Next, we assume
that the µPT follows a conjugate Gaussian prior distribution
µPT ∼ N (µt0 , Σt0 ).
Rise slew rate, Vdd=1V
14000
2000
0
4π 2
Rise slew rate, Vdd=0.85V
16000
Sout/(Cload+Cpar+*Sin)
Td/(Cload+Cpar+*Sin)
10000
IV. BAYESIAN I NFERENCE WITH M AXIMUM A P OSTERIORI
(MAP) E STIMATION
In this section, we present a Bayesian inference approach
with maximum-a-posteriori (MAP) estimation where instead
of computing {Td , Sout } at each input condition separately,
we will estimate {kd , Cpar , V 0 , α} globally by maximizing the
(i)
(i)
joint probability of observing (ξ (i) , Td ) or (ξ (i) , Sout ), (i =
1, 2, ..., k).
The first step is to transfer observed training samples
(i)
(i)
(ξ (i) , Td ) or (ξ (i) , Sout ), (i = 1, 2, ..., k) to parameter subspace
0
{kd , Cpar , V , α} and use both to derive a probability distribution on the parameter space. The pdf ’s on {kd , Cpar , V 0 , α} for
delay and output slew can then be calculated and the parameter
extraction problem solved using maximum a posteriori (MAP)
estimation.
Without loss of generality, we describe the MAP estimation for
delay parameter subgroup PT = {kd , Cpar , V 0 , α}. Parameters
for output slew are estimated in a similar manner.
First, we assume that PT follows a Gaussian distribution
PT ∼ N (µPT , ΣPT ):
pdf (PT ) =
18000
Delay L-H, Vdd=0.7V
their fitting errors. Strong similarities in extracted parameters are
observed among different cells and technologies from different
nodes, which serves as a basis for minimizing the required input
combinations in statistical characterization in the next section.
Although our proposed model captures major physical effects,
for some technologies there might be an offset between the
proposed model and circuit simulations. In those cases, extra
fitting terms (e.g., Sin · Cload ) might be needed. The optimal
model complexity will be given by a trade-off between model
accuracy of degree of data compression.
0 (V
V
)
-0.266
-0.209
-0.241
-0.287
-0.228
-0.253
-0.272
-0.258
-0.264
α
0.092
0.034
0.102
0.103
0.034
0.104
0.107
0.050
0.117
% error
1.56%
1.98%
0.91%
1.50%
2.05%
1.12%
1.84%
1.94%
1.47%
Table I shows extracted parameters for delay model from
INV, NAND2 and NOR2 in three different technologies with
1
1
p
· exp[− (µPT − µt0 )T Σ−1
t0 (µPT − µt0 )]
2
|Σt0 |
(7)
where µt0 and Σt0 are the mean vector and covariance matrix
of µPT , respectively. We also define the delay model precision
as βfTd , which equals the inverse variance of modeling errors
across different technologies. Given µPT and βfTd , we calculate
(i)
the likelihood of observing delay at ith input condition Td
associated with subspace distribution pdf (PT ) as
r
(i)
pdf (Td |µPT
, βfTd (ξ
(i)
)) =
βTd (ξ (i) )
2π
1 (i)
·exp[− (Td − fT (ξ (i) , µPT ))2 βfTd (ξ (i) )]
2
(8)
The learning of precision βfTd is a key step in this method.
In practice, βfTd represents our “uncertainty” on proposed
delay model at different input conditions due to its inability
of capturing certain physical effects. While they depend on
the details of the technologies, these precisions show a strong
systematic trend across different input conditions ξ . In this work,
extracted parameters µPT from past technologies are used to
learn the systematic precision βfTd at different input conditions.
Characterizations from a variety of technology nodes enable us
to propagate our historical belief to a new technology node.
While generic or broad historical technologies can be used to
learn approximate precisions, in order to achieve the highest
applicable prior precision, the best historical technologies would
be those with the same design or process choices as the target
technology. For example, if we intend to fit a library in a low
power process, appropriate historical technologies would also be
technologies in low power processes. Therefore a bias-variance
tradeoff is needed in the selection of historical libraries.
The detailed learning process proceeds as follows. First, a
full set of standard cell libraries in Ntech fabrication processes
and technology nodes (Ntech = 6 in this paper, including
technologies from 14-nm to 45-nm, with both bulk-Silicon and
SOI technologies and non-FINFET and FINFET technologies)
are employed as “historical data” to improve our confidence
in predicting βfTd on an unknown library. This assumes that
although a new technology introduces different lithography,
structures and materials, parameters from our proposed delay
model do not change much, as is shown in Section 3. After
selection of a group of historical libraries, each cell is fitted
into the proposed delay model with different input conditions
ξ . βfTd is then calculated by the inverse variance of relative
difference between measurements and delay model predictions
using extracted parameters.
βfT =
tion. Hence the optimization problem in (15) is also a convex
programming problem and can be solved both efficiently and
robustly.
So far we have achieved individual library cell characterization (no statistical characterization included). The detailed efficient statistical library cell characterization proceeds as follows.
Nsample different seeds for each cell under process variation
are generated through Monte Carlo (MC) simulation or Design
of Experiments (DoE) [4]. For j th seed in each cell, {Td }
and {Sout } under k input conditions are simulated through a
(j)
(j)
SPICE simulation using .ALTER statement. PT and PS are
extracted through proposed Bayesian inference with maximuma-posteriori (MAP) estimation for j th seed. For a targeted input
condition ξ , the probability distribution of delay and output slew
are calculated as pdf (fT (ξ, PT )) and pdf (fS (ξ, PS )).
Historical
Learning
𝑁𝑠𝑎𝑚𝑝𝑙𝑒 seeds
under process
variation
LUT input
conditions
d
𝑗++ 𝑖++
1
PNtech
1
j=1
Ntech
(j)
(
Td
(j)
−fT (PT ) 2
)
(j)
Td
− (N
1
tech
PNtech Td(j) −fT (PT(j) ) 2
(j)
j=1
)
Td
(9)
After the estimation of likelihood and precision, we are able
(1)
(k)
to transfer delay characterization {Td , ..., Td } to parameter
subspace PT and obtain the conditional probability of observing
(i)
Td given µPT and βfTd (ξ (i) ). We then combine this conditional probability with the prior distribution pdf (µPT ) in (7)
to accurately estimate µPT . Assuming each delay simulation is
ideal, we can write the likelihood function pdf (Td |µPT , βfTd )
as:
pdf (Td |µPT , βfTd ) =
k
Y
(i)
pdf (Td |µPT , βfTd (ξ (i) ))
(10)
Yes
Electrical
simulation
(𝑇𝑑 , 𝑆𝑜𝑢𝑡 )
Netlist
Statistical
Fitting input learning for
conditions 𝜉𝐹 new cells
Electrical
simulation
(𝑇𝑑 , 𝑆𝑜𝑢𝑡 )
𝑖 < 𝑁𝑡𝑎𝑏𝑙𝑒
𝑖<𝑘
No
No
Parameters
(𝑘𝑑 , 𝐶𝑝𝑎𝑟 , 𝑉 ′ , 𝛼)
Yes
𝑗 < 𝑁𝑡𝑒𝑐ℎ
𝑖++
𝑗++
Yes
Prior Parameters 𝑃𝑇 𝑗 , 𝑃𝑆 𝑗
(𝑘𝑑 , 𝐶𝑝𝑎𝑟 , 𝑉 ′ , 𝛼)
𝑗 < 𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑠
No
Yes
No
Mean 𝜇𝑡0 , 𝜇𝑠0 Covariance Σ𝑡0 ,
Σ𝑠0 and precision 𝛽𝑇𝑑 and 𝛽𝑆𝑜𝑢𝑡
Compute Probability Density Function
pdf(𝑓𝑇 (𝜉, 𝑃𝑇 )), pdf(𝑓𝑠 (𝜉, 𝑃𝑆 ))
i=1
According to Bayes’ theory, the conditional distribution
pdf (µPT |Td ) is proportional to the product of the prior
pdf (µPT ) and the likelihood function pdf (Td |µPT ):
pdf (µPT |Td ) ∝ pdf (µPT ) · pdf (Td |µPT )
(11)
The precision βfTd is learned from historical cell delay characterization and is therefore independent of the observation Td .
Consequently,
pdf (Td |µPT , βfTd ) = pdf (Td |µPT )
(12)
Substituting (10) and (12) into (11) yields:
pdf (µPT |Td ) ∝ pdf (µPT ) ·
k
Y
(i)
pdf (Td |µPT , βfTd (ξ (i) ))
(13)
i=1
The last step is maximum-a-posteriori (MAP) estimation to find
optimal estimates of µPT that maximize the log likelihood of the
posterior distributions lnpdf (µPT |Td ). It can be mathematically
formulated as an optimization problem
maximize ln pdf (µPT ) +
µP
T
k
X
(i)
ln pdf (Td |µPT , βfTd (ξ (i) ))
(14)
i=1
Substituting (7) and (8) into (14) and removing the constant
items yield:
1
minimize (µPT − µt0 )T Σ−1
t0 (µPT − µt0 )
µP
2
T
+
k
1 X (i)
(T − fT (ξ (i) , µPT ))2 βfTd (ξ (i) )
2 i=1 d
(15)
where (15) is the summation of a concave quadratic func-
Fig. 4. Proposed flow for statistical characterization with both old and new
libraries interacting and priors being passed from an old library to a new one.
Fig. 4 summarizes the major steps of the proposed statistical
library cell characterization method with both old and new
libraries interacting and priors being passed from an old library
to a new one. If we assume that library cell characterizations
have been done in previous technologies, the total computation cost is O(k · Nsample ), which is at least one order of
magnitude smaller compared with O(NLU T · Nsample ) in prior
work and several order of magnitudes smaller compared with
O(NLU T · NM C ) in standard method. The total computation
cost is O(k · Nsample + NT ech · NLU T ) if we need to re-run
characterization for old technologies, which is still a moderate
speed up compared to the most cutting-edge techniques.
V. VALIDATION
In this section, two library cell characterization examples in
several cutting-edge CMOS technologies are used to demonstrate
the efficiency of our proposed method. All test cases as well
as the historical library cell characteristics are generated using
different BSIM based industrial design kits reflecting real measurements. To test and compare with the prior part, we have
also implemented both deterministic extraction and statistical
extraction using a look-up table (LUT) approach.
The baseline characterization is defined in this work by a
1000 points Monte Carlo simulation sampled randomly within
the whole input space ξ = {Sin , Cload , Vdd }. Note that these
points only represent different operating conditions for a target
cell while the effects of process variation are not included. Fig. 5
shows a scatter plot for 1000 points among the whole input space
Validation
 We use a randomly distributed input combinations on an unknown 14nm technology as a validation
 HSPICE
as aour
gold
standard
where wesimulation
will compare
characterization
result with standard distributions for delay and output slew with different input
combinations.
methods.
 We
computer INV, NAND and NOR with both their rising and falling
time
The error functions for statistical characterization of E(µTd ),
E(µSout ), E(σTd ) and E(σSout ) are defined as
E(µTd ) =
1
0.9
E(σTd ) =
0.8
0.7
1.5
6
-11
4
1
x 10
-15
2
0.5
Input Slew (s)
13
E(σSout ) =
0 0
x 10
Output Cap (F)
Fig. 5. A scatter plot of 1000 points among whole input space ξ =
{Sin , Cload , Vdd } used for comparing our characterization result with standard
methods.
The first example is to conduct a nominal delay and output
slew characterization for a library designed in a commercial
state-of-the-art 14-nm FINFET technology. Both fitting and
testing samples are generated through SPICE simulation using a
well calibrated compact transistor model. Fig. 6 shows average
prediction error compared with the baseline characterization
using proposed model with Bayesian inference, proposed model
with our least-square error function optimization, and look-up
table approach. To achieve the same characterization accuracy
on delay Td , our proposed method achieves up to 15X runtime
speedup compared to a traditional lookup table approach, where
6X speedup is contributed by our proposed timing model and an
extra speedup of 2.5X is contributed by the Bayesian inference.
Given the prior and two additional fitting input combinations, a
4.3% average error compared with the baseline characterization
is achieved for all combinations of Cload , Sin and Vdd . This
demonstrates the sparsity of effects across input vectors and the
validity of the proposed delay model.
n 1 X
(i) (i)
µ(fS (ξ , PS )) − µSout n i=1
n 1 X
(i) (i)
σ(fT (ξ , PT ))) − σTd n i=1
n 1 X
(i) (i)
σ(fS (ξ , PS )) − σSout n i=1
(16)
(17)
(18)
(19)
Fig. 7 and Fig. 8 show average prediction error for mean
and standard deviation of delay and output slew characterizing a
library designed in a commercial state-of-the-art 28-nm technology compared with the baseline characterization using proposed
method and look-up table approach. Up to 20X runtime speedup
is observed to achieve the same characterization accuracy in
mean value and standard deviation of Td and Sout .
Fig. 9 shows delay probability density simulated using baseline simulation, the proposed method with seven training input
combinations, and an interpolation of look-up tables with 60
training input combinations together with baseline distribution
using SPICE Monte Carlo simulation. The input combination
is Vdd = 0.734V , Sin = 5.09ps, Cload = 1.67f F . The
proposed method shows a much better prediction for delay distribution which correctly predicts the non-Gaussian distribution
for low Vdd .
15
𝜇(𝑇𝑑 )
50
Proposed Model+Bayesian
Inference
Proposed
Model+Baysian Inference
Lookup Table
Lookup
Table
𝜎(𝑇𝑑 )
Proposed Model+Bayesian
Inference
Proposed
Model+Baysian Inference
Lookup Table
Lookup
Table
40
Prediction Error(%)
Vdd (V)
E(µSout ) =
n 1 X
(i) (i)
µ(fT (ξ , PT ))) − µTd n i=1
Prediction Error(%)
Validation input spread
10
17X reduction
5
30
20
20X reduction
10
Proposed Model+Bayesian Inference
Proposed Model+LSE
0
1
Lookup Table
18
Fig.
23 6. Average testing error for delay Td characterizing a library designed in
a commerical state-of-the-art 14-nm technology. Error bars show one standard
deviation of testing error for different cells and RISE/FALL.
The second example is to conduct statistical delay and output
slew characterization for a library designed in a commercial
state-of-the-art 28-nm bulk-Silicon technology, which is different from the model used in the first example. The baseline
characterization is defined similar to previous example where
1000 input combinations are sampled randomly within the whole
space ξ = {Sin , Cload , Vdd }. In this case 1000 seeds under
process variation are generated for each cell to obtain statistical
2
3
5
10
20
Training Samples
50
100
0
1
2
3
5
10
20 30 50
Training Samples
100
Fig. 7. Average testing error for mean and standard deviation of delay
Td characterizing a library designed in a commerical state-of-the-art 28-nm
technology. Error bars show one standard deviation of testing error for different
cell types and RISE/FALL combinations.
VI. C ONCLUSION
In this paper we have presented an entirely different perspective on the acceleration of library characterizations. While previous authors have emphasized the use of statistical techniques
to address the efficient design of variation-aware standard cell
libraries by working in process space, in our work we use similar
techniques for the efficient design of these libraries by working
in the traditional library input space of input slew, output load,
and voltage supply. The main insight that has enabled this shift
in perspective is the contribution of a new ultra compact timing
model for standard cells that is a powerful and accurate generalization of the simple Cload Vdd /Idsat metric. This new analytical
timing model transfers the library characterization problem from
one of input parameter sweep to one of machine learning and
20
𝜇(𝑆𝑜𝑢𝑡 )
Prediction Error(%)
Prediction Error(%)
10
18X reduction
5
30
20
19X reduction
10
2
3
5
10
20 30 50
Training Samples
0
1
100
2
3
5
10
20 30 50
Training Samples
100
Pdf
statistical
model
Fig.
8. of
Average
testing error
for mean and standard deviation of output slew
Sout characterizing a library designed in a commerical state-of-the-art 28-nm
technology. ErrorLookup
bars show
standard
deviation of testing error for different
table:one
60 fitting
combinations
cell types and RISE/FALL
combinations.
Proposed method:
7 fitting combinations
Bias: Vdd=0.734V, slew=5.09e-12, Cap=1.67fF
11
3.5
x 10
Lookup
Table
Lookup Table
Proposed Model+Bayesian
Proposed
Model+Baysian Inference
Ideal
3
2.5
Frenquency
19
Proposed Model+Bayesian Inference
Proposed Model+LSE
40
15
0
1
𝜎(𝑆𝑜𝑢𝑡 )
50
Proposed Model+Bayesian Inference
Proposed Model+LSE
2
1.5
1
0.5
0
0.8
1
1.2
1.4
1.6
Delay (s)
1.8
2
2.2
2.4
-11
x 10
20 Fig.
9. Delay probability density simulated using baseline simulation, proposed method and an interpolation of look-up tables together with baseline
distribution using SPICE Monte Carlo simulation with an input combination
of Vdd = 0.734V , Sin = 5.09ps, Cload = 1.67f F .
sparse sampling. Machine learning is used to develop priors
of timing model coefficients using old libraries while sparse
sampling is used to provide the extra data points needed to
build the new library in the target technology. Our methods
have resulted in 15X reduction in simulation runs with respect
to baseline techniques that use random sampling methods.
ACKNOWLEDGMENT
This work was funded by the Cooperative Agreement between the Masdar Institute of Science and Technology (Masdar Institute), Abu Dhabi, UAE and the Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, Reference
196F/002/707/102f/70/9374.
R EFERENCES
[1] L. Lavagno, L. Scheffer, and G. Martin. EDA for IC Implementation,
Circuit Design, and Process Technology. Addision-Wesley, 2006.
[2] L. Yu, L. Wei, D. Antoniadis, I. Elfadel, and D. Boning. Statistical
modeling with the virtual source MOSFET model. In Design, Automation
Test in Europe Conference Exhibition (DATE), pages 1454–1457, 2013.
[3] X. Li. Finding deterministic solution from underdetermined equation:
Large-scale performance modeling by least angle regression. In Design
Automation Conference, pages 364–369, 2009.
[4] L. Brusamarello, P. Wirth, G.and Roussel, and M. Miranda. Fast and accurate statistical characterization of standard cell libraries. Microelectronics
Reliability, 51(12):2341 – 2350, 2011.
[5] Z. Zhang, T.A. El-Moselhy, I.M. Elfadel, and L. Daniel. Stochastic testing
method for transistor-level uncertainty quantification based on generalized
polynomial chaos. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 32(10):1533–1545, Oct. 2013.
[6] Z. Zhang, I.A.M. Elfadel, and L. Daniel. Uncertainty quantification for integrated circuits: Stochastic spectral methods. In International Conference
on Computer-Aided Design (ICCAD), pages 803–810, Nov. 2013.
[7] Z. Zhang, T.A. El-Moselhy, I.M. Elfadel, and L. Daniel. Calculation of
generalized polynomial-chaos basis functions and gauss quadrature rules in
hierarchical uncertainty quantification. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 33(5):728–740, May
2014.
[8] Z. Zhang, X. Yang, I. V. Oseledets, G. E. Karniadakis, and L. Daniel. Enabling high-dimensional hierarchical uncertainty quantification by ANOVA
and tensor-train decomposition. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, PP(99):1–1, Nov. 2014.
[9] Z. Zhang, X Yang, G. Marucci, P. Maffezzoni, I.M. Elfadel, G. Karniadakis,
and L. Daniel. Stochastic testing simulator for integrated circuits and
mems: Hierarchical and sparse techniques. In Custom Integrated Circuits
Conference (CICC), pages 1–8, Sep. 2014.
[10] F. Wang, W. Zhang, S. Sun, X. Li, and C. Gu. Bayesian model fusion:
large-scale performance modeling of analog and mixed-signal circuits by
reusing early-stage data. In Design Automation Conference (DAC), pages
64:1–64:6, 2013.
[11] S. Reda and S.R. Nassif. Analyzing the impact of process variations
on parametric measurements: Novel models and applications. In Design,
Automation Test in Europe (DATE), pages 375–380, Apr. 2009.
[12] L. Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, and D. Boning.
Remembrance of transistors past: Compact model parameter extraction
using bayesian inference and incomplete new measurements. In Design
Automation and Conference (DAC), 2014.
[13] L. Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, and D. Boning.
Efficient performance estimation with very small sample size via physical
subspace projection and maximum a posteriori estimation. In Design,
Automation and Test in Europe (DATE), 2014.
[14] S. Sun, F. Wang, S. Yaldiz, X. Li, L. Pileggi, A. Natarajan, M. Ferriss,
J. Plouchart, B. Sadhu, B. Parker, A. Valdes-Garcia, M. Sanduleanu,
J. Tierno, and D. Friedman. Indirect performance sensing for on-chip
analog self-healing via bayesian model fusion. In Custom Integrated
Circuits Conference (CICC), pages 1–4, Sep. 2013.
[15] S. Sun, X. Li, H. Liu, K. Luo, and B. Gu. Fast statistical analysis of
rare circuit failure events via scaled-sigma sampling for high-dimensional
variation space. In International Conference on Computer-Aided Design
(ICCAD), pages 478–485, Nov. 2013.
[16] C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag
New York, Inc., Secaucus, NJ, USA, 2006.
[17] T. Sakurai and A.R. Newton. Alpha-power law MOSFET model and its
applications to CMOS inverter delay and other formulas. IEEE Journal of
Solid-State Circuits, 25(2):584–594, Apr. 1990.
[18] V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, and R.W. Brodersen. Energy-delay tradeoffs in combinational logic using gate sizing and
supply voltage optimization. In European Solid-State Circuits Conference
(ESSCIRC), pages 211–214, Sep. 2002.
[19] A. Khakifirooz and D.A. Antoniadis. MOSFET performance scaling - part
I: Historical trends. IEEE Transactions on Electron Devices, 55(6):1391–
1400, June 2008.
[20] L. Yu, O. Mysore, L. Wei, L. Daniel, D. Antoniadis, I. Elfadel, and
D. Boning. An ultra-compact virtual source fet model for deeply-scaled
devices: Parameter extraction and validation for standard cell libraries and
digital circuits. In Asia and South Pacific Design Automation Conference(ASPDAC), pages 521–526, 2013.
[21] M.-H. Na, E.J. Nowak, W. Haensch, and J. Cai. The effective drive current
in cmos inverters. In International Electron Devices Meeting (IEDM), pages
121–124, Dec. 2002.
[22] J. Deng and H. Wong. Metrics for performance benchmarking of nanoscale
Si and carbon nanotube FETs including device nonidealities. IEEE
Transactions on Electron Devices, 53(6):1317–1322, June 2006.
[23] E. Yoshida, Y. Momiyama, M. Miyamoto, T. Saiki, M. Kojima, S. Satoh,
and T. Sugii. Performance boost using a new device design methodology
based on characteristic current for low-power CMOS. In International
Electron Devices Meeting (IEDM), Dec. 2006.
[24] S. Rakheja and D. Antoniadis. MVS 1.0.1 nanotransistor model (Silicon),
Nov. 2013.
[25] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems
Perspective. Addison-Wesley Longman Publishing Co., Inc., Boston, MA,
USA, 1985.
[26] X. Lin, Y. Wang, and M. Pedram. Joint sizing and adaptive independent
gate control for finfet circuits operating in multiple voltage regimes using
the logical effort method. In International Conference on Computer-Aided
Design (ICCAD), pages 444–449, Nov. 2013.
[27] X. Lin, Y. Wang, S. Nazarian, and M. Pedram. An improved logical effort
model and framework applied to optimal sizing of circuits operating in
multiple supply voltage regimes. In International Symposium on Quality
Electronic Design (ISQED), pages 249–256, Mar. 2014.
Download