PDF Standard Template (Feb0510)

advertisement
Efficient IC Statistical Modeling and
Extraction Using a Bayesian Inference
Framework
Li Yu, Ibrahim (Abe) Elfadel1, and Duane Boning
Electrical Engineering and Computer Science
Microsystems Technology Laboratories, MIT
1Masdar
Institute of Science and Technology
Device Variation & Statistical Compact Modeling Issues
How to build valid statistical models
enabling robust circuit design?
How to efficiently extract transistor
parameters ?
2
How to predict circuit performance using
mixtures of on-chip test measurements?
What is the lower bound for the number
of IV measurements to fit a model
and how select those measurements?
Traditional Methods: System Identification Approaches
๏ฎ System identification approaches: try to determine a
mathematical relation between input and output
without going into the physical details of the system
๏ฎ Parameter extraction through least-square
optimization, ๐œบ ๐ = ๐’‚๐’“๐’ˆ๐’Ž๐’Š๐’๐ ๐… − ๐’‡(๐•, ๐) ๐Ÿ
๏ฎ Statistical modeling/moment estimation through
Backward Propagation of Variance (BPV),
๐๐‘ญ๐’Š ๐Ÿ
๐‘ต
๐Ÿ
๐ˆ๐‘ญ๐’Š = ๐’‹=๐Ÿ( ) ๐ˆ๐’‘๐’‹ ๐Ÿ
๐๐’‘๐’‹
๏ฎ Performance modeling through response surface
modeling (RSM), ๐œบ ๐ = ๐’‚๐’“๐’ˆ๐’Ž๐’Š๐’๐œถ ๐… − ๐œถ โˆ™ ๐ ๐Ÿ
3
Limitations in System Identification Approaches
๏ฎ Challenges with solution uniqueness/expressiveness
๏‚ท We wish to extract parameters with physical meaning
๏‚ท We need to preserve physical correlation among parameters
๏‚ท We need to limit the parameter values to a specific domain
๏ฎ Challenges from over-fitting
๏‚ท We wish to model the system with limited measurements/samples
๏‚ท Some methods require uncorrelated measurements/variables
๏‚ท Need to perform feature selection
๏ฎ Challenges with noisy data
๏‚ท Equal weight is assigned to different measurements
๏‚ท We wish to identify data with systematic offset or measurement
noise
4
Our Approach to IC Statistical Modeling and Extraction
๏ฎ MIT Virtual Source (MVS) Compact MOSFET๏ฎ Bayesian Extraction Method with
Model (Antoniadis et al.)
Very Limited Data
•
•
•
Ultra-compact model: small number of
physically based parameters
Applicable to nano-scale MOS devices
Fit model from small number of early
device and monitor measurements
Statistical IC Modeling and
Extraction Methods
๏ฎ Statistical Formulations
•
•
Variation model and extraction using
Back-Propagation of Variance (BPV)
Projection onto physical subspace
spanned by MVS model (in contrast
to PCA projections)
๏ฎ Applications
•
•
•
Early technology evaluation and trends
Efficient circuit performance evaluation
Efficient statistical library cell timing
characterization
Shaloo Rakheja; Dimitri Antoniadis (2013), "MVS 1.0.1 Nanotransistor
Model (Silicon),“ http://nanohub.org/resources/19684.
5
Outline
๏ฎ Introduction
๏ฎ Foundation
๏‚ท MIT Virtual Source (MVS) Transistor Model: A Physical Solution
๏ฎ Performance Estimation with Mixed & Small Size Samples
๏‚ท Physical Subspace Projection
๏‚ท Maximum a Posteriori (MAP) Estimation
๏ฎ Compact Model Parameter Extraction
๏‚ท Bayesian Inference: Learning Precision and Prior at Different Biases
๏‚ท Optimal Sampling of Transistor Measurements
๏ฎ Statistical Library Characterization
๏‚ท Novel Delay/Slew Model
๏‚ท Exploring Sparsity in Library Input Space
๏ฎ Summary/Acknowledgement
6
MIT Virtual Source (MVS) Model
๏ฎ Virtual source velocity ๐‘ฃ๐‘ฅ0
๏ฎ Continuity function ๐น๐‘  links linear
and saturation region
๐น๐‘  =
๐‘‰๐‘‘๐‘  ๐‘‰๐‘‘๐‘ ๐‘Ž๐‘ก
(1+(๐‘‰๐‘‘๐‘  ๐‘‰๐‘‘๐‘ ๐‘Ž๐‘ก )๐›ฝ )1 ๐›ฝ
๏ฎ Inversion charge density ๐‘„๐‘–๐‘ฅ0
๐‘„๐‘–๐‘ฅ0 = ๐ถ๐‘–๐‘›๐‘ฃ ๐‘›๐œ™๐‘ก ln(1 + exp
๐‘‰๐บ๐‘† −(๐‘‰๐‘‡ −๐›ผ๐œ™๐‘ก ๐น๐‘“ )
๐‘›๐œ™๐‘ก
)
๐ผ๐ท ๐‘Š = ๐‘„๐‘–๐‘ฅ0 ๐‘ฃ๐‘ฅ0 ๐น๐‘†
๏ฎ Function ๐น๐‘“ is a smoothing function
Parameters
๐‘‰๐‘‡๐ŸŽ (V)
๐‘›0
Description
Strong inversion threshold voltage
Sub-threshold swing factor
๐›ฟ (๐‘š๐‘‰/๐‘‰)
Drain-induced barrier lowering
๐‘ฃ๐‘ฅ0 (๐‘๐‘š/๐‘ )
Virtual source carrier velocity
๐œ‡ (๐‘๐‘š2 /๐‘‰ โˆ™ ๐‘ )
๐‘…๐‘ 0 (๐‘œโ„Ž๐‘š โˆ™ ๐œ‡๐‘š)
Low-field mobility
Series resistance per side
Shaloo Rakheja; Dimitri Antoniadis (2013), "MVS 1.0.1 Nanotransistor
Model (Silicon),“ http://nanohub.org/resources/19684.
7
๏ฎ Dynamic model
partitions channel
charge; includes
parasitic caps:
๐ถ๐‘–๐‘“ , ๐ถ๐‘œ๐‘“ , ๐ถ๐‘œ๐‘ฃ
Outline
๏ฎ Introduction
๏ฎ Foundation
๏‚ท MIT Virtual Source (MVS) Transistor Model: A Physical Solution
๏ฎ Performance Estimation with Mixed & Small Size Samples
๏‚ท Physical Subspace Projection
๏‚ท Maximum a Posteriori (MAP) Estimation
๏ฎ Compact Model Parameter Extraction
๏‚ท Bayesian Inference: Learning Precision and Prior at Different Biases
๏‚ท Optimal Sampling of Transistor Measurements
๏ฎ Statistical Library Characterization
๏‚ท Novel Delay/Slew Model
๏‚ท Exploring Sparsity in Library Input Space
๏ฎ Summary/Acknowledgement
8
Challenge: Statistical Performance Modeling
with Mixed and Limited Data
๏ฎ Goal: approximate circuit performance as function of process
variations
Performance
modeling
Process domain
Performance domain
๏ฎ Response surface modeling (RSM) is widely used
๏‚ท Performance ๐‘” โˆ†๐‘‹ =
๏‚ท
๏‚ท
๏‚ท
๏‚ท
๐’ˆ โˆ†๐‘ฟ :
โˆ†๐‘ฟ:
๐’ƒ๐’Œ โˆ†๐‘ฟ :
๐œถ๐’ˆ๐’Œ :
๐‘š ๐›ผ๐‘”๐‘˜
โˆ™ ๐‘๐‘˜ โˆ†๐‘‹ = ๐‘1 โˆ†๐‘‹ , ๐‘2 โˆ†๐‘‹ , … , ๐‘๐‘€ โˆ†๐‘‹
โˆ™
๐›ผ๐‘”1
๐›ผ๐‘”2
…
๐›ผ๐‘”๐‘€
target performance of interest (e.g. frequency of a digital circuit)
vector of random variables to model process variations
basis functions (e.g., linear or quadratic polynomials)
model coefficients
๏ฎ Requires large sample size in order to solve (i.e., sample
size K > M model parameters)
๏ฎ High dimensionality of โˆ†๐‘ฟ poses challenges in applying RSM
๏‚ท Principal component analysis (PCA) or related approaches often required
9
X. Li et al., “Projection-based performance modeling for inter/intra-die variations,” ICCAD, 2005.
Proposed Method – Physical Subspace Projection
and Maximum a Posteriori Estimation
Traditional Method
RSM
PCA
Performance measurements
Principal components
Target performance
Proposed Method
Physical subspace
projection
MAP
Process
Shift
calibration
Device-array measurements
Physical subspace
projection
Performance domain:
target performance of
interest
RO measurements
Performance domain:
measurements from test structures
10
Probability map of physical
subspace: {๐‘‰๐‘ก๐‘› , ๐‘‰๐‘ก๐‘ , ๐‘ฃ๐‘ฅ๐‘› , ๐‘ฃ๐‘ฅ๐‘ }
Physical Subspace Projection: Intuition
๏ฎ MVS model simplifies the complex process to capture
a target transistor model from measurement by
๏‚ท Limited number of parameters
๏‚ท Most parameters are directly inferable from IV curve
๏ฎ A gap still exists with traditional RSM approaches
๏‚ท How to translate backwards from measurements in the
performance domain (e.g., RO frequency) to MVS domain?
๏‚ท Typically very small number of measurements are monitored for
post-silicon validation purposes and therefore a low dimensional
subspace is preferable
๏‚ท Hard to weight mixed measurements from different types of
measurement structures, i.e., device & circuit structures (RO’s)
๏ฎ Idea: find the best (most likely) physical MVS model
parameters that explain past and new measurement
data
11
(1) Physical Subspace Projection
๏ฎ The purpose of physical subspace projection is to
transfer mixture of measurements into a unique
probability space spanned by MVS parameter space ๐‘ฟ
๏ฎ Assumption: ๐‘ฟ are the subspace variables satisfying a
multivariate Gaussian distribution ๐‘‹~๐’ฉ ๐œ‡๐‘‹ , ๐œƒ ,
๏‚ท The initial value of ๐œƒ equals the covariance of subspace variables
under within-die variation
๏‚ท Key advantage: subspace variables can be correlated
๏ฎ We choose a conjugate Gaussian prior for ๐œ‡๐‘‹ ~๐’ฉ ๐œ‡0 , Σ0
๏‚ท ๐œ‡0 are the nominal value for the subspace variables
๏‚ท Σ0 are the variance of subspace variables under die-to-die variation
๏ฎ The probability of observing data point ๐น๐’Š (๐’๐’Š ) in ๐’Šth group
associated with subspace distribution ๐‘๐‘‘๐‘“ ๐น๐’Š ๐’๐’Š ๐œ‡๐‘‹ , ๐œƒ :
๏‚ท
12
(2) Maximum a Posteriori Estimation
๏ฎ Posterior distribution after observing
all data is
๐’‘๐’…๐’‡ ๐‘ญ, ๐๐‘ฟ ๐œฝ = ๐’‘๐’…๐’‡(๐๐‘ฟ ) โˆ™ ๐’‘๐’…๐’‡(๐‘ญ๐Ÿ |๐๐‘ฟ , ๐œฝ) โˆ™…โˆ™
๐’‘๐’…๐’‡(๐‘ญ๐’Ž |๐๐‘ฟ , ๐œฝ)
๏ฎ Our goal is to find ๐๐‘ฟ that maximizes
๐ฅ๐ง ๐’‘๐’…๐’‡ ๐‘ญ, ๐๐‘ฟ ๐œฝ = ๐ฅ๐ง
๐‘ฟ
๐’‘๐’…๐’‡ ๐‘ญ, ๐๐‘ฟ ๐œฝ ๐ ๐๐‘ฟ
๏ฎ However, ๐œƒ is a unknown parameter
since it is a local parameter for
particular die
๏ฎ Initial value of ๐œƒ is the covariance of
subspace variables under only intradie variation
๏ฎ Use an expectation maximization
(EM) algorithm to find covariance ๐œƒ
Initiate ๐œƒ
๐œƒ ๐‘œ๐‘™๐‘‘
=
๐œƒ ๐‘›๐‘’๐‘ค
๐‘(๐‘‹|๐น, ๐œƒ ๐‘œ๐‘™๐‘‘ ) =
๐‘(๐‘‹, ๐น|๐œƒ ๐‘œ๐‘™๐‘‘ ) ๐‘(๐น|๐œƒ ๐‘œ๐‘™๐‘‘ )
โ„’ ๐œƒ, ๐œƒ ๐‘œ๐‘™๐‘‘ =
๐‘ ๐‘‹|๐น, ๐œƒ ๐‘œ๐‘™๐‘‘ ๐‘™๐‘›๐‘(๐‘‹, ๐น|๐œƒ)๐‘‘๐‘ฅ
๐œƒ ๐‘›๐‘’๐‘ค = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅโ„’(๐œƒ, ๐œƒ ๐‘œ๐‘™๐‘‘ )
๐œƒ ๐‘œ๐‘™๐‘‘ − ๐œƒ ๐‘›๐‘’๐‘ค < ๐œ€
Yes
stop
13
No
(3) Prediction of Performance using MVS Model
๏ฎ Once ๐œฝ and ๐๐‘ฟ are obtained, we can estimate the mean
and standard deviation of target performance ๐๐’ˆ and
๐ˆ๐’ˆ using the MVS model and Spice simulations
๏ฎ However, there may be mismatch between the nominal
performance values of post-layout simulations and the
measurements: typical shift of 15% or less
๏‚ง Shifts in the corresponding performance distributions
are due to modeling and extraction inaccuracy
๏ฎ Therefore a very small sample size (~5) can be used to
calibrate the difference between measurements and
MVS model prediction
๏ฎ This step is referred to as process shift calibration
14
Experimental Results
๏ฎ
๏ฎ
๏ฎ
๏ฎ
Measurements from on-chip test structures
Designed in 28-nm bulk CMOS process
Measured from 3186 dies in 27 wafers
Test sites and structures:
๏‚ท Device-array
๏‚ท RO-array
15
Validation (I): Compared with Naïve Approach
๏ฎ Naïve approach: Model the output
(RO frequency) as function of mean
values from measurement of a single
type of device DUT (e.g., individual
NMOS, PMOS devices)
๏ฎ Our approach: Capture correlations
across multiple DUT types (use a
mixture group of measurements)
Performance
modeling
Process domain
Performance domain
Relative error compared to average measurement data
6
5 groups(#1,2,3,4,5)
3 groups(#1,2,3)
single group(#1)
single group(#2)
single group(#3)
Relative Error(%)
5
4
Prediction of NOR RO frequency
using mean of single group of
transistor DUT measurements
3.25x samples reduction
Prediction of NOR RO frequency
using mean of mixture group of
transistor DUT measurements
3
~10x samples reduction
2
1
1
2
3
4
Sample sizes
16
Relative prediction error for DUT 6 (NOR RO)
versus replicate samples per die (sample size)
using different DUT group means to fit models.
Prediction of NOR RO frequency
using mean of mixture group of
both transistor and RO DUT
measurements (proposed method)
Validation (II): Compared with PCA+RSM Method
๏ฎ Two methods are compared with our proposed method
๏‚ท Standard least square regression (LSR) after PCA (blue circles)
๏‚ท Advanced least angle regression (LAR) utilizing sparsity after PCA
(black triangles)
๏‚ท Proposed method: physical subspace projection with MAP (red
squares)
27
folds
……
Cross validation on 27
wafers (3186 die)
17
Relative prediction error for DUT 6 group
(NOR RO) versus number of training dies
Comparison for Priors with Different Prediction Error
18
Outline
๏ฎ Introduction
๏ฎ Foundation
๏‚ท MIT Virtual Source (MVS) Transistor Model: A Physical Solution
๏ฎ Performance Estimation with Mixed & Small Size Samples
๏‚ท Physical Subspace Projection
๏‚ท Maximum a Posteriori (MAP) Estimation
๏ฎ Compact Model Parameter Extraction
๏‚ท Bayesian Inference: Learning Precision and Prior at Different Biases
๏‚ท Optimal Sampling of Transistor Measurements
๏ฎ Statistical Library Characterization
๏‚ท Novel Delay/Slew Model
๏‚ท Exploring Sparsity in Library Input Space
๏ฎ Summary/Acknowledgement
19
Previous Optimization Flow for
I-V Parameter Extraction in the MVS Model
Automatically adjust ๐‘‰๐‘กโ„Ž0
to achieve selfconsistency in ๐‘„๐‘–๐‘ฅ๐‘œ
Sub I-V
Measurement
All I-V
Measurement
๏ฎ Parameters are divided
into two groups:
Least square
optimization for subparameters set ๐‘†, ๐›ฟ …
๏‚ท ๐‘ท๐’”๐’–๐’ƒ for the sub-threshold
region: ๐‘ฝ๐’•๐ŸŽ , ๐’๐ŸŽ, ๐œน
๏‚ท ๐‘ท๐’‚๐’ƒ๐’๐’—๐’† for the abovethreshold region: ๐, ๐‘น๐’”๐ŸŽ , ๐’—๐’™๐’
Least square
optimization for aboveparameters set ๐‘ฃ๐‘ฅ๐‘œ , ๐œ‡ …
No
Converge
๏ฎ Optimized through the least-square error function :
๏‚ท ๐œบ ๐‘ท๐’”๐’–๐’ƒ =
๐Ÿ
๐Ÿ
๏‚ท ๐œบ ๐‘ท๐’‚๐’ƒ๐’๐’—๐’† =
๐’
๐’Š=๐Ÿ{ln
๐Ÿ
๐Ÿ
๐น๐‘– − ln๐‘“ ๐‘ฝ๐‘– , ๐‘ท๐’‚๐’ƒ๐’๐’—๐’† , ๐‘ท๐’”๐’–๐’ƒ } 2
๐’
๐’Š=๐Ÿ{๐น๐‘–
− ๐‘“ ๐‘ฝ๐‘– , ๐‘ท๐’‚๐’ƒ๐’๐’—๐’† , ๐‘ท๐’”๐’–๐’ƒ }2
๏‚ท Substantial number of measurements for each device (~20
measurements) needed in traditional extraction of MVS model
20
L. Yu et al., “An ultra-compact virtual source FET model for deeply-scaled devices: Parameter
extraction and validation for standard cell libraries and digital circuits,” ASPDAC 2013.
New Approach: Bayesian Inference and Past/Prior
Information to Enable MVS Model with Limited New Data
๏ฎ Problems in traditional Least Square
Estimation (LSE) objective function
๏‚ท Equal weights:
(1) Systematic offset
(2) Noisy region
๏‚ท Numerical solution
Id (A/micron)
Modeling error
(1) Initial value
(2) Searching boundary
Vg (V)
๏ฎ Novel objective function utilizing
Bayesian Inference
21
L. Yu et al., “Remembrance of Transistors Past: Compact Model
Parameter Extraction Using Incomplete New Measurements and a
Bayesian Framework,” DAC 2014.
Id (A/micron)
๏‚ท Good consistency of MVS model parameters
across various nodes
๏‚ท Different weight according to data “uncertainty”
๏‚ท A proper prior distribution learned from
historical data
Measurement error
Vg (V)
๏ฎ Modeling errors:
MVS model error +
measurement error
MVS model error
0.5
0.5
Idlin
Idsat
0.4
๏‚ท Inability of MVS to capture
certain physical effects
๏‚ท E.g. gate tunneling effects
Idlin
Idsat
0.4
๏ณ(log10Id)
๏ฎ The learning of precision
๐›ฝln๐น๐‘– −1 and ๐›ฝ๐น๐‘– −1 is a key
step in this work
at Different Biases
๏ณ(log10Id)
Learning Precision ๐›ฝ๐น๐‘–
−1
0.3
0.2
0.1
0.3
0.2
0.1
0
0
0.5
0
0
1
0.5
Vg(V)
1
Vg(V)
-5
10−๐Ÿ“
× x๐Ÿ๐ŸŽ
๏ฎ Measurement errors:
๏‚ท Noise or other inaccuracies in
current measurements
๏‚ท Important in low ๐‘ฝ๐’ˆ region
22
1.5
1.5
11
0.5
0.5
00
11
0.8
0.8
๐‘ฝ๐’ˆ ๐‘ฝ
0.8
0.8
22
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
Vg (V)
11
0.2
0.2
00
0.2
0.2
๐‘ฝ๐’… ๐‘ฝ
Vd (V)
Extraction of average uncertainty
๐›ฝ๐น๐‘– −1
Learning a Prior Distribution
๏ฎ Assuming each current measurement is independent,
the likelihood function ๐‘๐‘‘๐‘“ ๐‘ญ ๐๐‘ท๐’”๐’–๐’ƒ , ๐›ฝln๐น๐‘– is
๏‚ท ๐‘๐‘‘๐‘“ ๐‘ญ ๐๐‘ƒ๐‘ ๐‘ข๐‘ , ๐›ฝln๐น๐‘– =
๐‘›
๐‘–=1 ๐‘๐‘‘๐‘“
๐น๐‘– ๐๐‘ƒ๐‘ ๐‘ข๐‘ , ๐›ฝln๐น๐‘–
๏ฎ According to Bayes’ rule, we have
๏‚ท ๐‘๐‘‘๐‘“ ๐๐‘ƒ๐‘ ๐‘ข๐‘ ๐‘ญ ∝ ๐‘๐‘‘๐‘“(๐๐‘ƒ๐‘ ๐‘ข๐‘ ) โˆ™
๐‘›
๐‘–=1 ๐‘๐‘‘๐‘“
๐น๐‘– ๐๐‘ƒ๐‘ ๐‘ข๐‘ , ๐›ฝln๐น๐‘–
๏ฎ Prior ๐œ‡๐’”0 , Σ๐‘ 0 provides information before new
measurements about expected transistor I-V curves
23
Mean and standard deviation of the transistor IV Mean and standard deviation of the transistor IV
curve learned from historical transistor data (no curve learned from historical transistor data (with
particular technology information ).
particular technology information ).
Sequential Bayesian Learning from Prior and
I-V Measurement (sub-threshold parameters: DIBL & SS)
Id (A/micron)
๐‘‰๐‘‘ = 0.05๐‘‰,
๐‘‰๐‘” = 0.05๐‘‰
Vg (V)
24
๐‘‰๐‘‘ = 0.9๐‘‰,
๐‘‰๐‘” = 0.05๐‘‰
๐‘‰๐‘‘ = 0.05๐‘‰,
๐‘‰๐‘” = 0.3๐‘‰
๐‘‰๐‘‘ = 0.9๐‘‰,
๐‘‰๐‘” = 0.3๐‘‰
Maximum a Posteriori Estimation
๏ฎ Goal: argmax ๐‘๐‘‘๐‘“ ๐๐‘ƒ๐‘ ๐‘ข๐‘ ๐‘ญ and argmax ln๐‘๐‘‘๐‘“ ๐๐‘ƒ๐‘Ž๐‘๐‘œ๐‘ฃ๐‘’ ๐‘ญ
๐‘ท๐’”๐’–๐’ƒ
๐‘ท๐’‚๐’ƒ๐’๐’—๐’†
๏ฎ Equivalent to minimization with new error function
1
๏‚ท ๐œบ ๐‘ท๐’”๐’–๐’ƒ = 2 ๐๐‘ƒ๐‘ ๐‘ข๐‘ − ๐œ‡๐’”0
25
๐‘‡
๐Ÿ
Σ๐‘ 0 −1 ๐๐‘ƒ๐‘ ๐‘ข๐‘ − ๐œ‡๐’”0 + ๐Ÿ
๐’
๐’Š=๐Ÿ ๐›ฝln๐น๐‘– {ln
๐น๐‘– −
Example I: Early Technology Evaluation (14nm - 28nm)
Technology 2
Technology 1
๐‘ฝ๐’… ๐‘ฝ
26
๐‘ฝ๐’… ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
Technology 3
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’… ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’… ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
๐‘ฝ๐’ˆ ๐‘ฝ
Technology 4
Parameter Consistency with Number of Measurements
20
15
10
5
0
0
% error vs. baseline
LSE
Bayesian
10
5
0
0
20
Measurements
40
LSE
Bayesian
40
20
10
0
20
Measurements
40
๐›ฟ
70
LSE
Bayesian
60
50
40
30
20
10
0
๏‚ท Average extraction error
for 50 transistors (variation
& noise)
๏‚ท Baseline: parameters
extracted from
measurements with
20๐‘š๐‘‰ ๐‘‰๐‘”๐‘  intervals (~100
measurements total)
30
0
40
๐‘‰๐‘ก0
15
27
20
Measurements
50
0
20
Measurements
40
๐‘›0
15
% error vs. baseline
LSE
Bayesian
% error vs. baseline
25
๐œ‡
% error vs. baseline
% error vs. baseline
๐‘ฃ๐‘ฅ0
LSE
Bayesian
10
5
0
0
20
Measurements
40
Statistical Extraction using Measurement Results (28nm)
๐‘‰๐‘ก
๐‘‰๐‘ก
๐›ฟ
๐‘›0
๐‘ฃ๐‘ฅ๐‘œ
๐œ‡
๐‘›0 ๐‘ฃ๐‘ฅ๐‘œ
๐›ฟ
๐œ‡
1 -0.104 0.363 0.339 0.328
-0.104
1 -0.001 -0.311 0.075
0.363 -0.001
1 0.616 0.529
0.339 -0.311 0.616
1 0.819
0.328 0.075 0.529 0.819
1
Allows and captures
correlation between MVS
model parameters
28
Outline
๏ฎ Introduction
๏ฎ Foundation
๏‚ท MIT Virtual Source (MVS) Transistor Model: A Physical Solution
๏‚ท Statistical MVS Model through Backward Propagation of Variance
๏ฎ Performance Estimation with Very Small Sample Size
๏‚ท Physical Subspace Projection
๏‚ท Maximum a Posteriori (MAP) Estimation
๏ฎ Compact Model Parameter Extraction
๏‚ท Bayesian Inference: Learning Precision and Prior at Different Biases
๏‚ท Optimal Sampling of Transistor Measurements
๏ฎ Statistical Library Characterization
๏‚ท Novel Delay/Slew Model
๏‚ท Exploring Sparsity in Library Input Space
๏ฎ Summary/Acknowledgement
29
Problem Definition:
Statistical Library Timing Characterization
๏‚ท Traditional approach by lookup
table with both ๐œ‡ and ๐œŽ generated
from Monte Carlo Simulation
Supply voltage: ๐‘‰๐‘‘๐‘‘
No impact
No impact
๏‚ท Interpolation is needed given a
group of input combinations
๏‚ท Recent work explores the sparsity
in MC sampling space (process
space)
๏‚ท We propose a novel delay/slew
model with four parameters
combined with Bayesian inference
exploiting sparsity in input vector
space
Driving
strength: ๐ผ๐‘’๐‘“๐‘“
Input slew: ๐‘†๐‘–๐‘›
Output load: ๐ถ๐‘™๐‘œ๐‘Ž๐‘‘
๏‚ท ๐‘ก๐‘‘,๐‘™−โ„Ž =
๐‘˜๐‘‘ (๐‘‰๐‘‘๐‘‘ +๐‘‰′)
(๐ถ๐‘๐‘Ž๐‘Ÿ
๐ผ๐‘’๐‘“๐‘“,๐‘๐‘š๐‘œ๐‘ 
+ ๐ถ๐‘™๐‘œ๐‘Ž๐‘‘ + ๐›ผ โˆ™ ๐‘ ๐‘™๐‘’๐‘ค)
๏‚ท ๐‘ก๐‘‘,โ„Ž−๐‘™ =
๐‘˜๐‘‘ (๐‘‰๐‘‘๐‘‘ +๐‘‰′)
(๐ถ๐‘๐‘Ž๐‘Ÿ
๐ผ๐‘’๐‘“๐‘“,๐‘›๐‘š๐‘œ๐‘ 
+ ๐ถ๐‘™๐‘œ๐‘Ž๐‘‘ + ๐›ผ โˆ™ ๐‘ ๐‘™๐‘’๐‘ค)
๐‘‰
๐‘‰๐‘‘๐‘‘
๐ผ ๐‘‰๐‘‘๐‘  = ๐‘‘๐‘‘
,
๐‘‰
=๐‘‰
+๐ผ
๐‘‰
=๐‘‰
,
๐‘‰
=
๐‘”๐‘ 
๐‘”๐‘ 
๐‘‘๐‘‘
๐‘‘๐‘ 
๐‘‘๐‘‘
2
2
30
๏‚ท ๐ผ๐‘’๐‘“๐‘“,๐‘›๐‘š๐‘œ๐‘  =
2
= ๐‘“(๐‘‰๐‘กโ„Ž , ๐‘‘๐‘’๐‘™๐‘ก๐‘Ž, ๐‘†๐‘†, ๐‘ฃ๐‘ฅ๐‘œ , ๐‘š๐‘ข, … )
Model Validation
Tech Cell
๐‘˜๐‘‘
๐ถ๐‘๐‘Ž๐‘Ÿ (๐‘“๐น) ๐‘‰′(V)
๐›ผ
error
A
INV
0.389 0.951
-0.266 0.0922 1.56%
A
NAND2 0.372 1.328
-0.209 0.0342 1.98%
A
NOR2 0.356 1.186
-0.241 0.102
B
INV
0.416 1.046
-0.287 0.1029 1.50%
B
NAND2 0.403 1.471
-0.228 0.0339 2.05%
B
NOR2 0.374 1.276
-0.253 0.1041 1.12%
C
INV
-0.272 0.1069 1.84%
C
NAND2 0.383 1.120
-0.258 0.050
C
NOR2 0.368 1.225
-0.264 0.1170 1.47%
31
0.389 0.978
0.91%
1.94%
delay
slew
delay
slew
Bayesian Inference Characterization Flow
32
Validation
Validation input spread
Nominal characterization
Vdd (V)
1
0.9
0.8
0.7
1.5
6
-11
4
1
x 10
-15
0.5
0 0
Input Slew (s)
15
x 10
2
Output Cap (F)
๐œ‡(๐‘‡๐‘‘ )
50
Proposed Model+Baysian Inference
Lookup Table
Statistical characterization
๐œŽ(๐‘‡๐‘‘ )
11
3.5
Proposed Model+Baysian Inference
Lookup Table
x 10
Lookup Table
Proposed Model+Baysian Inference
Ideal
3
10
5
17X reduction
2.5
30
Frenquency
Prediction Error(%)
Prediction Error(%)
40
20
20X reduction
2
1.5
1
10
0.5
0
1
33
2 3
5
10
20
Training Samples
50
100
0
1
2 3
5
10
20 30 50
Training Samples
100
0
0.8
1
1.2
1.4
1.6
Delay (s)
1.8
2
2.2
2.4
-11
x 10
Acknowledgments
๏ฎ Funding and support provided in part through the
MIT/Masdar Institute Cooperative Program, and in part
through collaboration with PDF Solutions, Inc.
34
Download