Statistical library characterization using belief propagation across multiple technology nodes The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Li Yu, Sharad Saxena, Christopher Hess, Ibrahim (Abe) M. Elfadel, Dimitri Antoniadis, and Duane Boning. 2015. Statistical library characterization using belief propagation across multiple technology nodes. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE '15). EDA Consortium, San Jose, CA, USA, 1383-1388. As Published http://dl.acm.org/citation.cfm?id=2757012.2757134 Publisher Association for Computing Machinery (ACM) Version Author's final manuscript Accessed Wed May 25 19:18:52 EDT 2016 Citable Link http://hdl.handle.net/1721.1/96913 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ Statistical Library Characterization Using Belief Propagation across Multiple Technology Nodes Li Yu, Sharad Saxena1 , Christopher Hess1 , Ibrahim (Abe) M. Elfadel2 , Dimitri Antoniadis, Duane Boning Massachusetts Institute of Technology, 1 PDF Solutions, 2 Masdar Institute of Science and Technology Email: yul09@mit.edu Abstract—In this paper, we propose a novel flow to enable computationally efficient statistical characterization of delay and slew in standard cell libraries. The distinguishing feature of the proposed method is the usage of a limited combination of output capacitance, input slew rate and supply voltage for the extraction of statistical timing metrics of an individual logic gate. The efficiency of the proposed flow stems from the introduction of a novel, ultra-compact, nonlinear, analytical timing model, having only four universal regression parameters. This novel model facilitates the use of maximum-a-posteriori belief propagation to learn the prior parameter distribution for the parameters of the target technology from past characterizations of library cells belonging to various other technologies, including older ones. The framework then utilises Bayesian inference to extract the new timing model parameters using an ultra-small set of additional timing measurements from the target technology. The proposed method is validated and benchmarked on several production-level cell libraries including a state-of-the-art 14-nm technology node and a variation-aware, compact transistor model. For the same accuracy as the conventional lookup-table approach, this new method achieves at least 15x reduction in simulation runs. I. I NTRODUCTION A standard cell library capturing statistical information of delay and output slew variations is at the core of statistical static timing analysis (SSTA), and, cost efficient statistical characterization of such libraries has become essential. The most widely used statistical library cell characterization method is based on the look-up table (LUT) approach where gate propagation delay (td ), output transition time (Sout ) and their variations are stored in a look-up table with different combinations of inputs such as cell types, input slew (Sin ), load capacitance (Cload ), supply voltage (Vdd ), and other parameters [1]. The runtime complexity required for such a statistical LUTbased approach is O(Nsample · NLU T ), where Nsample is the number of SPICE runs needed to obtain each mean and variance value and NLU T is the number of input vector combinations. This approach will quickly become infeasible as either NLU T or Nsample in a technology increases. Historically, circuit level Monte Carlo (MC) simulation has been employed to generate a number of samples in the process parameter probability space [2]. Such approach allows variability-aware analysis to be implemented with minor changes on top of existing characterization tools but requires a large number of MC runs. To address this challenge, several approaches based on sensitivity analysis for library characterization have been proposed by EDA vendors. For instance, Composite Current Source (CSC) is adopted by the Synopsys PrimeTime SSTA tool and sensitivitybased effective-current-source-model (S-ECSM) is adopted by the Cadence statistical tool. All of these approaches aim at modelling the statistical impact of process parameter variations as a linear superposition of the impact of each parameter in the response model of the affected metric. Several Response Surface Methodologies (RSMs) have also been proposed to explore the sparsity of the process regression coefficients. An example of such a strategy is Least-Angle Regression (LAR) which uses L1 -norm regularization [3]. One major benefit of regularizing with the L1 -norm is that it results in sample complexity that is logarithmic in the number of features (e.g., principal components). For statistical characterization of standard cells, an error propagation technique using linear sensitivity analysis and Response Surface Methodology (RSM) using Brussel Design of Experiments (DoE) was proposed for library characterization in [4]. The Brussel DoE performs statistical feature selection keeping only those features that are most relevant to the response under consideration. Then it uses a model selection algorithm to build a suitable regression model for all the responses. More Recently, several statistical circuit simulator based on uncertainty quantification have been successfully applied to avoid the huge number of repeated simulations in conventional Monte Carlo flows [5]–[9]. On the other hand, the expensive simulation cost of the statistical LUT-based approach is not only due to high dimensionality of the process space, but also due to high dimensionality of the cell input space (e.g., cell type, input slew Sin , load capacitance, supply voltage Vdd , etc.). This problem is further exacerbated as more design options are provided in recent technologies (e.g., multi-Vt , multi-Vdd ). While most of the existing work focuses on exploring the sparsity of the regression coefficients of the process space with a reduced process sample size for each input space vector, correlations between different cells and different input vectors within the same cell have not been considered in the open literature, to the best of our knowledge. This has been the main motivation of this work which proposes a novel acceleration method that operates in the library input space rather than its process space and that can be added to any acceleration used in the process space. This is achieved through the systematic use of recent advances in statistics and semiconductor metrology that we apply to the development of computationally efficient statistical characterization algorithms for standard cell libraries. We propose two key techniques to explore correlations in library input space. The first is a novel ultra-compact, analytical model for gate timing characterisation, and the second is a Bayesian learning algorithm for the parameters of the aforementioned timing model using past library characterizations along with a very small set of additional simulations from the target technology. Bayesian approaches were initially introduced in the area of VLSI design for post-Silicon validation and parameter extraction [10]–[15]. The intrinsic simplicity of the proposed timing model combined with the Bayesian learning [16] framework is capable of building very accurate circuit response representations. The rest of this paper is organized as follows. Section 2 introduces basic notation and formulates the problem of statistical characterization in library input space. Section 3 describes prior work on gate delay modelling and presents our novel ultra-compact analytical model for gate delay and slew. Section 4 presents our Bayesian algorithm which learns timing model parameters from past library characterizations and a very small set of additional simulation runs in the target technology. The foundation of this algorithm is the use of maximum-a-posteriori (MAP) estimation. In Section 5, our new methods are validated on the library characterization in state-of-the-art 14-nm and 28-nm technology and compared with the LUT method. Our conclusions are given in Section 6. II. P ROBLEM F ORMULATION In library characterization, an accurate model for cell delay (Td ) and output slew (Sout ) is developed given the following input data: a cell type, input slew (Sin ), output load capacitance (Cload ), transition direction (RISE/FALL), and supply voltage (Vdd ). To formalize the library characterization problem, we consider an individual logic gate with multiple inputs and one output, and for simplicity, we start from the standard assumption that only one timing arc is modelled at a time, which implies that we do not consider simultaneous input switching. For p input variables (ξ = {ξ1 , ξ2 , ..., ξp }), such as Sin , Vdd , Cload , etc., the cell response is modeled as the following two functions: Td = fT (ξ1 , ξ2 , ..., ξp ) (1) Sout = fS (ξ1 , ξ2 , ..., ξp ) (2) The problem of nominal library characterisation is to estimate fT and fS given k input vectors {ξ} = {ξ (1) , ξ (2) , ..., ξ (k) } (1) (2) (k) (1) (2) and k output observations {Td , Td , ..., Td } and {Sout , Sout , (k) ..., Sout }, such that the timing prediction error with respect4 to a baseline case is minimized under the condition that k is very small. The nominal baseline case is defined by SPICE simulations under n different input vectors (n >> k ) sampled randomly within the input space ξ. Let us now denote by {Td } an ensemble of delay observations. This ensemble has been generated for a given input vector but under varying process parameters. Now we formulate the problem of statistical library characterisation in input space as that of estimating fT and fS given k input vectors {ξ} = {ξ (1) , ξ (2) , ..., ξ (k) } and k ensembles of output observations (1) (2) (k) (1) (2) (k) {{Td }, {Td }, ..., {Td }} and {{Sout }, {Sout }, ..., {Sout }}, such that the prediction error for the statistical metrics with respect to a statistical baseline case is minimized under the condition that k is very small. The statistical baseline case is defined by statistical SPICE simulations using the same n different input vectors (n >> k) as the nominal baseline case, where the SPICE simulations are now executed according to the Monte Carlo method in process space. The metrics of the statistical baseline case include the mean and standard deviation of delay and output slew at each input vector i ∈ {1, ..., n}. (i) (i) (i) (i) They are denoted as µTd , µSout and σTd , σSout (i = 1, 2, ..., n), respectively. III. M ODEL FOR DELAY AND OUTPUT SLEW Accurate gate level modeling for delay and slew estimation has become a major challenge for nanometric technologies. Historically, the transistor delay has been simply approximated by Cload Vdd /Idsat , where Idsat is the drain current at Vgs = Vds = Vdd . A more accurate model, named the alpha-power law, was later proposed in the early 1990s [17] where a closed-form expression was derived for the delay of an inverter. A simplified version of the alpha-power law was proposed in [18]. More recently, a simple analytical expression for the intrinsic MOSFET delay, using physics-based models for the effective current and the total gate switching charge, was proposed to better describe nanometric technologies [19], [20]. Although these advanced delay models provide accurate description of transition activity in the cell, they are still quite complex, and detailed process information is required to fit the entire model. Our first goal therefore is to contribute an ultra compact timing model that is at once a generalisation of older models but whose parameters allow a sparse representation of input space vectors. Fig. 1 (a) shows the key factors that affect the delay and output slew of an inverter. In this work, we consider the impact of input slew (Sin ), output load capacitance (Cload ), supply voltage (Vdd ), and driving strength (Ief f ). (b) (a) Fig. 1. (a) Key factors that affect the delay and output slew of an inverter; (b) NAND2 equivalent inverter: The pull-up network is replaced with an “equivalent” PMOS while the pull-down network is replaced with an “equivalent” NMOS device. To find our ultra compact model, we first study gate delay in a simple inverter and generalize it to any combinatorial logic cell. Recent studies [21]–[23] show that the simple Cload Vdd /Idsat metric follows the experimental inverter delay much better if the on-current in the denominator is replaced with an effective current Ief f representing the average switching current. In line with the intrinsic transistor delay defined in [19], we model cell delay as ∆Q Td = kd (3) Ief f where kd is a scaling factor used to obtain a good fit to the actual cell delays. Ief f is defined as Ief f = Id (Vgs = Vdd , Vds = Vdd 2 ) + Id (Vgs = 2 Vdd 2 , Vds = Vdd ) (4) and can be evaluated easily through performance modeling or through a circuit simulation that takes into account process variations [19], [24]. Since our focus is to model delay and output slew as functions of input variables, (1) and (2), we assume we know Ief f for each input vector. Note that the direct link between process parameters and delay is still preserved in the Ief f current. To generalize the above model to any combinatorial logic cell, we simply next each gate onto an “equivalent inverter” and use the inverter characterization to estimate delays and output slews [25]–[27]. Fig. 1(b) shows the equivalent inverter of a NAND2 where the pull-up network is replaced with a PMOS and the pull-down network is replaced with an NMOS device. The charge transferred to or from the load capacitance during switching is equal to ∆Q = (Vdd + V 0 )(Cload + Cpar + αSin ) 0 (5) where Cpar , V and α are all fitting parameters. Compared with the simple Cload Vdd /Idsat metric, several effects have been considered: (1) Cpar is introduced to account for parasitic capacitance, such as those associated with junctions and interconnects, which are not included in Cload ; (2) V 0 is introduced to compensate for the inaccuracy of the delay model at low Vdd ; and (3) a linear coefficient α is introduced to account for Sin ’s impact on delay. The estimates of fT and fS are then converted to parameter extraction problems for {kd , Cpar , V 0 , α}. A special feature of this simple delay model is that the same format is used to describe not only delay but also output slew Sout albeit with a different set of values for the fitting parameters {kd , Cpar , V 0 , α}. To validate the proposed model, Td · Ief f /(Vdd + V 0 ) and Sout · Ief f /(Vdd + V 0 ) versus different Vdd values are shown in Fig. 2, where Td and Sout are simulated through SPICE using a 14-nm industrial design kit and two separate V 0 values are extracted for Td and Sout . For different groups of Cload and Sin combinations, a constant value of Td · Ief f /(Vdd + V 0 ) and Sout · Ief f /(Vdd + V 0 ) is observed under different Vdd . 8 x 10 -15 x 10 Delay L-H Delay H-L -14 Rise slew rate Fall slew rate 0.9 7 0.8 Sout*Ieff/(Vdd+V') Td*Ieff/(Vdd+V') 6 5 4 3 0.7 0.6 0.5 0.4 0.3 2 0.2 1 0.1 0 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0 0.65 1 0.7 0.75 Vdd (V) 0.8 0.85 0.9 0.95 1 Vdd (V) Fig. 2. For a NOR2 cell designed in a commercial state-of-the-art 14-nm technology, a constant value of Td ·Ief f /(Vdd +V 0 ) and Sout ·Ief f /(Vdd +V 0 ) is observed versus different Vdd and RISE/FALL combinations. 4 Fig. 3 shows Td /(Cload + Cpar + αSin ) and Sout /(Cload + Cpar + αSin ) versus different Cload and Sin combinations. A similar result is observed here that for different Vdd and transition (RISE/FALL) combinations, Td /(Cload + Cpar + αSin ) and Sout /(Cload + Cpar + αSin ) are approximately constant. 12000 Delay L-H, Vdd=0.85V Delay L-H, Vdd=1V Delay H-L, Vdd=0.7V Delay H-L, Vdd=0.85V 8000 Delay H-L, Vdd=1V 6000 4000 Rise slew rate, Vdd=0.7V Fall slew rate, Vdd=0.7V Fall slew rate, Vdd=0.85V 12000 Fall slew rate, Vdd=1V 10000 8000 pdf (µPT ) = 6000 4 C load 6 8 10 12 14 & Input slew combination 0 2 4 6 8 10 12 14 Cload & Input slew combination Fig. 3. For a NOR2 cell designed in a commercial state-of-the-art 14-nm technology, a constant value of Td /(Cload +Cpar +αSin ) and Sout /(Cload + 5 Cpar + αSin ) is observed versus different Cload , Sin and RISE/FALL combinations. TABLE I E XTRACTED PARAMETERS FOR DELAY MODEL FROM INV, NAND2 AND NOR2 IN THREE DIFFERENT TECHNOLOGIES WITH THEIR FITTING ERROR . Tech A A A B B B C C C Cell INV NAND2 NOR2 INV NAND2 NOR2 INV NAND2 NOR2 kd 0.389 0.372 0.356 0.416 0.403 0.374 0.389 0.383 0.368 Cpar (f F ) 0.951 1.328 1.186 1.046 1.471 1.276 0.978 1.12 1.225 4π 2 4000 2000 2 1 1 p · exp[− (µPT − PT )T Σ−1 PT (µPT − PT )] 2 |ΣPT | (6) where µPT and ΣPT are the mean vector and covariance matrix of the parameter subgroup PT , respectively. Next, we assume that the µPT follows a conjugate Gaussian prior distribution µPT ∼ N (µt0 , Σt0 ). Rise slew rate, Vdd=1V 14000 2000 0 4π 2 Rise slew rate, Vdd=0.85V 16000 Sout/(Cload+Cpar+*Sin) Td/(Cload+Cpar+*Sin) 10000 IV. BAYESIAN I NFERENCE WITH M AXIMUM A P OSTERIORI (MAP) E STIMATION In this section, we present a Bayesian inference approach with maximum-a-posteriori (MAP) estimation where instead of computing {Td , Sout } at each input condition separately, we will estimate {kd , Cpar , V 0 , α} globally by maximizing the (i) (i) joint probability of observing (ξ (i) , Td ) or (ξ (i) , Sout ), (i = 1, 2, ..., k). The first step is to transfer observed training samples (i) (i) (ξ (i) , Td ) or (ξ (i) , Sout ), (i = 1, 2, ..., k) to parameter subspace 0 {kd , Cpar , V , α} and use both to derive a probability distribution on the parameter space. The pdf ’s on {kd , Cpar , V 0 , α} for delay and output slew can then be calculated and the parameter extraction problem solved using maximum a posteriori (MAP) estimation. Without loss of generality, we describe the MAP estimation for delay parameter subgroup PT = {kd , Cpar , V 0 , α}. Parameters for output slew are estimated in a similar manner. First, we assume that PT follows a Gaussian distribution PT ∼ N (µPT , ΣPT ): pdf (PT ) = 18000 Delay L-H, Vdd=0.7V their fitting errors. Strong similarities in extracted parameters are observed among different cells and technologies from different nodes, which serves as a basis for minimizing the required input combinations in statistical characterization in the next section. Although our proposed model captures major physical effects, for some technologies there might be an offset between the proposed model and circuit simulations. In those cases, extra fitting terms (e.g., Sin · Cload ) might be needed. The optimal model complexity will be given by a trade-off between model accuracy of degree of data compression. 0 (V V ) -0.266 -0.209 -0.241 -0.287 -0.228 -0.253 -0.272 -0.258 -0.264 α 0.092 0.034 0.102 0.103 0.034 0.104 0.107 0.050 0.117 % error 1.56% 1.98% 0.91% 1.50% 2.05% 1.12% 1.84% 1.94% 1.47% Table I shows extracted parameters for delay model from INV, NAND2 and NOR2 in three different technologies with 1 1 p · exp[− (µPT − µt0 )T Σ−1 t0 (µPT − µt0 )] 2 |Σt0 | (7) where µt0 and Σt0 are the mean vector and covariance matrix of µPT , respectively. We also define the delay model precision as βfTd , which equals the inverse variance of modeling errors across different technologies. Given µPT and βfTd , we calculate (i) the likelihood of observing delay at ith input condition Td associated with subspace distribution pdf (PT ) as r (i) pdf (Td |µPT , βfTd (ξ (i) )) = βTd (ξ (i) ) 2π 1 (i) ·exp[− (Td − fT (ξ (i) , µPT ))2 βfTd (ξ (i) )] 2 (8) The learning of precision βfTd is a key step in this method. In practice, βfTd represents our “uncertainty” on proposed delay model at different input conditions due to its inability of capturing certain physical effects. While they depend on the details of the technologies, these precisions show a strong systematic trend across different input conditions ξ . In this work, extracted parameters µPT from past technologies are used to learn the systematic precision βfTd at different input conditions. Characterizations from a variety of technology nodes enable us to propagate our historical belief to a new technology node. While generic or broad historical technologies can be used to learn approximate precisions, in order to achieve the highest applicable prior precision, the best historical technologies would be those with the same design or process choices as the target technology. For example, if we intend to fit a library in a low power process, appropriate historical technologies would also be technologies in low power processes. Therefore a bias-variance tradeoff is needed in the selection of historical libraries. The detailed learning process proceeds as follows. First, a full set of standard cell libraries in Ntech fabrication processes and technology nodes (Ntech = 6 in this paper, including technologies from 14-nm to 45-nm, with both bulk-Silicon and SOI technologies and non-FINFET and FINFET technologies) are employed as “historical data” to improve our confidence in predicting βfTd on an unknown library. This assumes that although a new technology introduces different lithography, structures and materials, parameters from our proposed delay model do not change much, as is shown in Section 3. After selection of a group of historical libraries, each cell is fitted into the proposed delay model with different input conditions ξ . βfTd is then calculated by the inverse variance of relative difference between measurements and delay model predictions using extracted parameters. βfT = tion. Hence the optimization problem in (15) is also a convex programming problem and can be solved both efficiently and robustly. So far we have achieved individual library cell characterization (no statistical characterization included). The detailed efficient statistical library cell characterization proceeds as follows. Nsample different seeds for each cell under process variation are generated through Monte Carlo (MC) simulation or Design of Experiments (DoE) [4]. For j th seed in each cell, {Td } and {Sout } under k input conditions are simulated through a (j) (j) SPICE simulation using .ALTER statement. PT and PS are extracted through proposed Bayesian inference with maximuma-posteriori (MAP) estimation for j th seed. For a targeted input condition ξ , the probability distribution of delay and output slew are calculated as pdf (fT (ξ, PT )) and pdf (fS (ξ, PS )). Historical Learning 𝑁𝑠𝑎𝑚𝑝𝑙𝑒 seeds under process variation LUT input conditions d 𝑗++ 𝑖++ 1 PNtech 1 j=1 Ntech (j) ( Td (j) −fT (PT ) 2 ) (j) Td − (N 1 tech PNtech Td(j) −fT (PT(j) ) 2 (j) j=1 ) Td (9) After the estimation of likelihood and precision, we are able (1) (k) to transfer delay characterization {Td , ..., Td } to parameter subspace PT and obtain the conditional probability of observing (i) Td given µPT and βfTd (ξ (i) ). We then combine this conditional probability with the prior distribution pdf (µPT ) in (7) to accurately estimate µPT . Assuming each delay simulation is ideal, we can write the likelihood function pdf (Td |µPT , βfTd ) as: pdf (Td |µPT , βfTd ) = k Y (i) pdf (Td |µPT , βfTd (ξ (i) )) (10) Yes Electrical simulation (𝑇𝑑 , 𝑆𝑜𝑢𝑡 ) Netlist Statistical Fitting input learning for conditions 𝜉𝐹 new cells Electrical simulation (𝑇𝑑 , 𝑆𝑜𝑢𝑡 ) 𝑖 < 𝑁𝑡𝑎𝑏𝑙𝑒 𝑖<𝑘 No No Parameters (𝑘𝑑 , 𝐶𝑝𝑎𝑟 , 𝑉 ′ , 𝛼) Yes 𝑗 < 𝑁𝑡𝑒𝑐ℎ 𝑖++ 𝑗++ Yes Prior Parameters 𝑃𝑇 𝑗 , 𝑃𝑆 𝑗 (𝑘𝑑 , 𝐶𝑝𝑎𝑟 , 𝑉 ′ , 𝛼) 𝑗 < 𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑠 No Yes No Mean 𝜇𝑡0 , 𝜇𝑠0 Covariance Σ𝑡0 , Σ𝑠0 and precision 𝛽𝑇𝑑 and 𝛽𝑆𝑜𝑢𝑡 Compute Probability Density Function pdf(𝑓𝑇 (𝜉, 𝑃𝑇 )), pdf(𝑓𝑠 (𝜉, 𝑃𝑆 )) i=1 According to Bayes’ theory, the conditional distribution pdf (µPT |Td ) is proportional to the product of the prior pdf (µPT ) and the likelihood function pdf (Td |µPT ): pdf (µPT |Td ) ∝ pdf (µPT ) · pdf (Td |µPT ) (11) The precision βfTd is learned from historical cell delay characterization and is therefore independent of the observation Td . Consequently, pdf (Td |µPT , βfTd ) = pdf (Td |µPT ) (12) Substituting (10) and (12) into (11) yields: pdf (µPT |Td ) ∝ pdf (µPT ) · k Y (i) pdf (Td |µPT , βfTd (ξ (i) )) (13) i=1 The last step is maximum-a-posteriori (MAP) estimation to find optimal estimates of µPT that maximize the log likelihood of the posterior distributions lnpdf (µPT |Td ). It can be mathematically formulated as an optimization problem maximize ln pdf (µPT ) + µP T k X (i) ln pdf (Td |µPT , βfTd (ξ (i) )) (14) i=1 Substituting (7) and (8) into (14) and removing the constant items yield: 1 minimize (µPT − µt0 )T Σ−1 t0 (µPT − µt0 ) µP 2 T + k 1 X (i) (T − fT (ξ (i) , µPT ))2 βfTd (ξ (i) ) 2 i=1 d (15) where (15) is the summation of a concave quadratic func- Fig. 4. Proposed flow for statistical characterization with both old and new libraries interacting and priors being passed from an old library to a new one. Fig. 4 summarizes the major steps of the proposed statistical library cell characterization method with both old and new libraries interacting and priors being passed from an old library to a new one. If we assume that library cell characterizations have been done in previous technologies, the total computation cost is O(k · Nsample ), which is at least one order of magnitude smaller compared with O(NLU T · Nsample ) in prior work and several order of magnitudes smaller compared with O(NLU T · NM C ) in standard method. The total computation cost is O(k · Nsample + NT ech · NLU T ) if we need to re-run characterization for old technologies, which is still a moderate speed up compared to the most cutting-edge techniques. V. VALIDATION In this section, two library cell characterization examples in several cutting-edge CMOS technologies are used to demonstrate the efficiency of our proposed method. All test cases as well as the historical library cell characteristics are generated using different BSIM based industrial design kits reflecting real measurements. To test and compare with the prior part, we have also implemented both deterministic extraction and statistical extraction using a look-up table (LUT) approach. The baseline characterization is defined in this work by a 1000 points Monte Carlo simulation sampled randomly within the whole input space ξ = {Sin , Cload , Vdd }. Note that these points only represent different operating conditions for a target cell while the effects of process variation are not included. Fig. 5 shows a scatter plot for 1000 points among the whole input space Validation We use a randomly distributed input combinations on an unknown 14nm technology as a validation HSPICE as aour gold standard where wesimulation will compare characterization result with standard distributions for delay and output slew with different input combinations. methods. We computer INV, NAND and NOR with both their rising and falling time The error functions for statistical characterization of E(µTd ), E(µSout ), E(σTd ) and E(σSout ) are defined as E(µTd ) = 1 0.9 E(σTd ) = 0.8 0.7 1.5 6 -11 4 1 x 10 -15 2 0.5 Input Slew (s) 13 E(σSout ) = 0 0 x 10 Output Cap (F) Fig. 5. A scatter plot of 1000 points among whole input space ξ = {Sin , Cload , Vdd } used for comparing our characterization result with standard methods. The first example is to conduct a nominal delay and output slew characterization for a library designed in a commercial state-of-the-art 14-nm FINFET technology. Both fitting and testing samples are generated through SPICE simulation using a well calibrated compact transistor model. Fig. 6 shows average prediction error compared with the baseline characterization using proposed model with Bayesian inference, proposed model with our least-square error function optimization, and look-up table approach. To achieve the same characterization accuracy on delay Td , our proposed method achieves up to 15X runtime speedup compared to a traditional lookup table approach, where 6X speedup is contributed by our proposed timing model and an extra speedup of 2.5X is contributed by the Bayesian inference. Given the prior and two additional fitting input combinations, a 4.3% average error compared with the baseline characterization is achieved for all combinations of Cload , Sin and Vdd . This demonstrates the sparsity of effects across input vectors and the validity of the proposed delay model. n 1 X (i) (i) µ(fS (ξ , PS )) − µSout n i=1 n 1 X (i) (i) σ(fT (ξ , PT ))) − σTd n i=1 n 1 X (i) (i) σ(fS (ξ , PS )) − σSout n i=1 (16) (17) (18) (19) Fig. 7 and Fig. 8 show average prediction error for mean and standard deviation of delay and output slew characterizing a library designed in a commercial state-of-the-art 28-nm technology compared with the baseline characterization using proposed method and look-up table approach. Up to 20X runtime speedup is observed to achieve the same characterization accuracy in mean value and standard deviation of Td and Sout . Fig. 9 shows delay probability density simulated using baseline simulation, the proposed method with seven training input combinations, and an interpolation of look-up tables with 60 training input combinations together with baseline distribution using SPICE Monte Carlo simulation. The input combination is Vdd = 0.734V , Sin = 5.09ps, Cload = 1.67f F . The proposed method shows a much better prediction for delay distribution which correctly predicts the non-Gaussian distribution for low Vdd . 15 𝜇(𝑇𝑑 ) 50 Proposed Model+Bayesian Inference Proposed Model+Baysian Inference Lookup Table Lookup Table 𝜎(𝑇𝑑 ) Proposed Model+Bayesian Inference Proposed Model+Baysian Inference Lookup Table Lookup Table 40 Prediction Error(%) Vdd (V) E(µSout ) = n 1 X (i) (i) µ(fT (ξ , PT ))) − µTd n i=1 Prediction Error(%) Validation input spread 10 17X reduction 5 30 20 20X reduction 10 Proposed Model+Bayesian Inference Proposed Model+LSE 0 1 Lookup Table 18 Fig. 23 6. Average testing error for delay Td characterizing a library designed in a commerical state-of-the-art 14-nm technology. Error bars show one standard deviation of testing error for different cells and RISE/FALL. The second example is to conduct statistical delay and output slew characterization for a library designed in a commercial state-of-the-art 28-nm bulk-Silicon technology, which is different from the model used in the first example. The baseline characterization is defined similar to previous example where 1000 input combinations are sampled randomly within the whole space ξ = {Sin , Cload , Vdd }. In this case 1000 seeds under process variation are generated for each cell to obtain statistical 2 3 5 10 20 Training Samples 50 100 0 1 2 3 5 10 20 30 50 Training Samples 100 Fig. 7. Average testing error for mean and standard deviation of delay Td characterizing a library designed in a commerical state-of-the-art 28-nm technology. Error bars show one standard deviation of testing error for different cell types and RISE/FALL combinations. VI. C ONCLUSION In this paper we have presented an entirely different perspective on the acceleration of library characterizations. While previous authors have emphasized the use of statistical techniques to address the efficient design of variation-aware standard cell libraries by working in process space, in our work we use similar techniques for the efficient design of these libraries by working in the traditional library input space of input slew, output load, and voltage supply. The main insight that has enabled this shift in perspective is the contribution of a new ultra compact timing model for standard cells that is a powerful and accurate generalization of the simple Cload Vdd /Idsat metric. This new analytical timing model transfers the library characterization problem from one of input parameter sweep to one of machine learning and 20 𝜇(𝑆𝑜𝑢𝑡 ) Prediction Error(%) Prediction Error(%) 10 18X reduction 5 30 20 19X reduction 10 2 3 5 10 20 30 50 Training Samples 0 1 100 2 3 5 10 20 30 50 Training Samples 100 Pdf statistical model Fig. 8. of Average testing error for mean and standard deviation of output slew Sout characterizing a library designed in a commerical state-of-the-art 28-nm technology. ErrorLookup bars show standard deviation of testing error for different table:one 60 fitting combinations cell types and RISE/FALL combinations. Proposed method: 7 fitting combinations Bias: Vdd=0.734V, slew=5.09e-12, Cap=1.67fF 11 3.5 x 10 Lookup Table Lookup Table Proposed Model+Bayesian Proposed Model+Baysian Inference Ideal 3 2.5 Frenquency 19 Proposed Model+Bayesian Inference Proposed Model+LSE 40 15 0 1 𝜎(𝑆𝑜𝑢𝑡 ) 50 Proposed Model+Bayesian Inference Proposed Model+LSE 2 1.5 1 0.5 0 0.8 1 1.2 1.4 1.6 Delay (s) 1.8 2 2.2 2.4 -11 x 10 20 Fig. 9. Delay probability density simulated using baseline simulation, proposed method and an interpolation of look-up tables together with baseline distribution using SPICE Monte Carlo simulation with an input combination of Vdd = 0.734V , Sin = 5.09ps, Cload = 1.67f F . sparse sampling. Machine learning is used to develop priors of timing model coefficients using old libraries while sparse sampling is used to provide the extra data points needed to build the new library in the target technology. Our methods have resulted in 15X reduction in simulation runs with respect to baseline techniques that use random sampling methods. ACKNOWLEDGMENT This work was funded by the Cooperative Agreement between the Masdar Institute of Science and Technology (Masdar Institute), Abu Dhabi, UAE and the Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, Reference 196F/002/707/102f/70/9374. R EFERENCES [1] L. Lavagno, L. Scheffer, and G. Martin. EDA for IC Implementation, Circuit Design, and Process Technology. Addision-Wesley, 2006. [2] L. Yu, L. Wei, D. Antoniadis, I. Elfadel, and D. Boning. Statistical modeling with the virtual source MOSFET model. In Design, Automation Test in Europe Conference Exhibition (DATE), pages 1454–1457, 2013. [3] X. Li. Finding deterministic solution from underdetermined equation: Large-scale performance modeling by least angle regression. In Design Automation Conference, pages 364–369, 2009. [4] L. Brusamarello, P. Wirth, G.and Roussel, and M. Miranda. Fast and accurate statistical characterization of standard cell libraries. Microelectronics Reliability, 51(12):2341 – 2350, 2011. [5] Z. Zhang, T.A. El-Moselhy, I.M. Elfadel, and L. Daniel. Stochastic testing method for transistor-level uncertainty quantification based on generalized polynomial chaos. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(10):1533–1545, Oct. 2013. [6] Z. Zhang, I.A.M. Elfadel, and L. Daniel. Uncertainty quantification for integrated circuits: Stochastic spectral methods. In International Conference on Computer-Aided Design (ICCAD), pages 803–810, Nov. 2013. [7] Z. Zhang, T.A. El-Moselhy, I.M. Elfadel, and L. Daniel. Calculation of generalized polynomial-chaos basis functions and gauss quadrature rules in hierarchical uncertainty quantification. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 33(5):728–740, May 2014. [8] Z. Zhang, X. Yang, I. V. Oseledets, G. E. Karniadakis, and L. Daniel. Enabling high-dimensional hierarchical uncertainty quantification by ANOVA and tensor-train decomposition. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, PP(99):1–1, Nov. 2014. [9] Z. Zhang, X Yang, G. Marucci, P. Maffezzoni, I.M. Elfadel, G. Karniadakis, and L. Daniel. Stochastic testing simulator for integrated circuits and mems: Hierarchical and sparse techniques. In Custom Integrated Circuits Conference (CICC), pages 1–8, Sep. 2014. [10] F. Wang, W. Zhang, S. Sun, X. Li, and C. Gu. Bayesian model fusion: large-scale performance modeling of analog and mixed-signal circuits by reusing early-stage data. In Design Automation Conference (DAC), pages 64:1–64:6, 2013. [11] S. Reda and S.R. Nassif. Analyzing the impact of process variations on parametric measurements: Novel models and applications. In Design, Automation Test in Europe (DATE), pages 375–380, Apr. 2009. [12] L. Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, and D. Boning. Remembrance of transistors past: Compact model parameter extraction using bayesian inference and incomplete new measurements. In Design Automation and Conference (DAC), 2014. [13] L. Yu, S. Saxena, C. Hess, I. Elfadel, D. Antoniadis, and D. Boning. Efficient performance estimation with very small sample size via physical subspace projection and maximum a posteriori estimation. In Design, Automation and Test in Europe (DATE), 2014. [14] S. Sun, F. Wang, S. Yaldiz, X. Li, L. Pileggi, A. Natarajan, M. Ferriss, J. Plouchart, B. Sadhu, B. Parker, A. Valdes-Garcia, M. Sanduleanu, J. Tierno, and D. Friedman. Indirect performance sensing for on-chip analog self-healing via bayesian model fusion. In Custom Integrated Circuits Conference (CICC), pages 1–4, Sep. 2013. [15] S. Sun, X. Li, H. Liu, K. Luo, and B. Gu. Fast statistical analysis of rare circuit failure events via scaled-sigma sampling for high-dimensional variation space. In International Conference on Computer-Aided Design (ICCAD), pages 478–485, Nov. 2013. [16] C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. [17] T. Sakurai and A.R. Newton. Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. IEEE Journal of Solid-State Circuits, 25(2):584–594, Apr. 1990. [18] V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, and R.W. Brodersen. Energy-delay tradeoffs in combinational logic using gate sizing and supply voltage optimization. In European Solid-State Circuits Conference (ESSCIRC), pages 211–214, Sep. 2002. [19] A. Khakifirooz and D.A. Antoniadis. MOSFET performance scaling - part I: Historical trends. IEEE Transactions on Electron Devices, 55(6):1391– 1400, June 2008. [20] L. Yu, O. Mysore, L. Wei, L. Daniel, D. Antoniadis, I. Elfadel, and D. Boning. An ultra-compact virtual source fet model for deeply-scaled devices: Parameter extraction and validation for standard cell libraries and digital circuits. In Asia and South Pacific Design Automation Conference(ASPDAC), pages 521–526, 2013. [21] M.-H. Na, E.J. Nowak, W. Haensch, and J. Cai. The effective drive current in cmos inverters. In International Electron Devices Meeting (IEDM), pages 121–124, Dec. 2002. [22] J. Deng and H. Wong. Metrics for performance benchmarking of nanoscale Si and carbon nanotube FETs including device nonidealities. IEEE Transactions on Electron Devices, 53(6):1317–1322, June 2006. [23] E. Yoshida, Y. Momiyama, M. Miyamoto, T. Saiki, M. Kojima, S. Satoh, and T. Sugii. Performance boost using a new device design methodology based on characteristic current for low-power CMOS. In International Electron Devices Meeting (IEDM), Dec. 2006. [24] S. Rakheja and D. Antoniadis. MVS 1.0.1 nanotransistor model (Silicon), Nov. 2013. [25] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1985. [26] X. Lin, Y. Wang, and M. Pedram. Joint sizing and adaptive independent gate control for finfet circuits operating in multiple voltage regimes using the logical effort method. In International Conference on Computer-Aided Design (ICCAD), pages 444–449, Nov. 2013. [27] X. Lin, Y. Wang, S. Nazarian, and M. Pedram. An improved logical effort model and framework applied to optimal sizing of circuits operating in multiple supply voltage regimes. In International Symposium on Quality Electronic Design (ISQED), pages 249–256, Mar. 2014.