piecewise linear models for switch-level simulation

advertisement
PIECEWISE LINEAR MODELS FOR
SWITCH-LEVEL SIMULATION
Russell Kao
Technical Report: CSL-TR-92-532
June 1992
Computer Systems Laboratory
Departments of Electrical Engineering and Computer Science
Stanford University
Stanford, California 943054055
Rsim is an efficient logic plus timing simulator that employs the switched resistor
transistor model and RC tree analysis to simulate efficiently MOS digital circuits at the
transistor level. We investigate the incorporation of piecewise linear transistor models
and generalized moments matching into this simulation framework. General piecewise
linear models allow more accurate MOS models to be used to simulate circuits that are
hard for Rsim. Additionally, they enable the simulator to handle circuits containing
bipolar transistors such as ECL and BiCMOS. Nonetheless, the switched resistor
model has proved to be efficient and accurate for a large class of MOS digital circuits.
Therefore, it is retained as just one particular model available for use in this framework.
The use of piecewise linear models requires the generalization of RC tree analysis.
Unlike switched resistors, more general models may incorporate gain and floating
capacitance. Additionally, we extend the analysis to handle non-tree topologies and
feedback. Despite the increased generality, for many common MOS and ECL circuits
the complexity remains linear. Thus, this timing analysis can be used to simulate,
efficiently, those portions of the circuit that are well described by traditional switch
level models, while simultaneously simulating, more accurately, those portions that are
not.
We present preliminary results from a prototype simulator, Mom. We demonstrate its
use on a number of MOS, ECL, and BiCMOS circuits.
v Words and AWE, moments matching, Pade approximation, switchlevel simulation, circuit simulation, piecewise linear
i
Copyright 8 1992
bY
Russell Kao
Acknowledgements
This thesis would not have been possible were it not for the companionship, encouragement,
support, and guidance from numerous friends and associates. First of all I would like to
thank Neil Wilhelm and Helen Davis for believing in me, inspiring me, and encouraging
me and for serving as role models. I don’t think I would have attempted it without their
support. I would also like to thank Forest Baskett and Sam Fuller for providing me the
opportunity to attend Stanford.
My advisor, Mark Horowitz, deserves special mention for his patience and guidance
throughout my stay at Stanford. As a instructor he has contributed more to my education
here than any other person. As an advisor his insight into circuits and his unerring ability
to ask the right (often terribly embarrassing) question have greatly increased the quality of
this work.
My understanding of circuits, simulation, and the programming environment has benefited from discussions with many people around CIS; including my second reader: Bruce
Wooley, as well as Bob Alverson, Don Stark, John Vlissides, Steve Tjiang, Doug Pan,
Teresa Meng, Tom Char&, and Drew Wingard. I am especially appreciative of the help
from and discussions with Arturo Salz, another aficionado of switch-level simulation.
Charlie Orgish and Laura Schrager are gratefully acknowledged for the dependable
computing environment at CIS. We are fortunate to have such good natured, talented,
and dedicated people performing the difficult task of managing our diverse and dynamic
computing resources.
I would also like to thank Richard Swan, the Western Research Laboratory, and Digital
Equipment Corporation for generously provided the computing equipment and financial
support for the duration of my stay at Stanford.
Finally, I am especially grateful to my new wife, Anna. Her unending love and support
have added new dimensions to my life.
...
111
L
To Mom and Dad
and Babes
iv
Contents
...
111
Acknowledgements
1 Introduction
1.1 Verification of Large Digital Designs . . . . . . . . . . .
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . .
2
.
. . . .
.
7
Previous Work in Transient Estimation
Circuit and Timing Simulation . . . . . . . . .
. . . .
.......
7
Circuit Simulation . . . . . . . . . . .
. . . .
.......
8
2.1.2 Acceleration of Circuit Simulation . .
. . . .
.......
9
2.2
Moment Based Timing Analysis and Simulation
. . . .
.......
13
2.3
Improving Rsim . . . . . . . . . . . .
. . . .
.......
19
2.4
Comparison and Summary . , . . . . . . . . .
. . . .
.......
20
2.1
2.1.1
22
3 Piecewise Linear Models
3.1
Piecewise Linear Representation . . . . . . .
..............
22
3.2
Model Restrictions . . . . . . . . . . . . . .
..............
24
3.3
MOS Level-O Model . . . . . . . . . . . . .
..............
3.4
MOS Level-l Model . . . . . . . . . . . . .
3.4.1 Rationale . . . . . . . . . . . . . .
25
..............
27
..............
27
3.4.2 Model . . . . . . . . . . . . . . . .
..............
29
. . . . . . . .
..............
31
3.4.3 Choosing Parameters
3.5
Bipolar Model . . . . . . . . . . . . . . . .
..............
37
3.6
Second Order Phenomena . . . . . . . . . .
..............
38
V
3.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Waveform Approximation
43
45
4.1 Waveform Approximation . . . . . .
4.2 Padt Approximation . . . . . . . . .
4.3 Practical Considerations . . . . . . .
4 . 3 . 1 OrderTooLow . . . . . . .
4 . 3 . 2 OrderTooHigh . . . . . .
4.3.3 Unstable Poles . . . . . . .
4.3.4 Efficiency . . . . . . . . .
4.3.5 Error Control . . . . . . .
. . . . . . . . . . . . . . . . . . .
46
. . . . . . . . . . . . . . . . . . .
46
. . . . . . . . . . . . . . . . . . .
49
. . . . . . . . . . . . . . . . . . .
49
. . . . . . . . . . . . . . . . . . . .
50
. . . . . . . . . . . . . . . . . . . .
52
. . . . . . . . . . . . . . . . . . .
54
. . . . . . . . . . . . . . . . . . . .
54
4.3.6 Frequency Scaling . . . . . .
4.4 Demonstration . . . . . . . . . . . .
4.5 Limitations . . . . . . . . . . . . . .
4.5.1 Unstable Responses . . . . .
. . . . . . . . . . . . . . . . . . .
56
. . . . . . . . . . . . . . . . . . .
58
. . . . . . . . . . . . . . . . . . .
62
. . . . . . . . . . . . . . . . . . .
62
4.5.2 Circuits with High Gain . . . . . . . . . . . . . . . . . . . . . .
4.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5 Moment Computation
66
68
5.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
69
5.2
Moment Computation for Leaky Resistor Trees . . . . . . . . .
. . .
70
5.3
Leaky Trees of Three Terminal Networks . . . . . . . . . . . .
. . .
73
5.4
Series-Parallel Combination . . . . . . . . . . . . . . . . . . .
. . . .
75
5.5
Coupled Clusters . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
76
5.6
Relaxing Network Restrictions . . . . . . . . . . . . . . . . . .
. . . .
80
Node and Branch Tearing . . . . . . . . . . . . . . . .
. . . .
82
5.6.2 Non-Leaky Tree Topologies . . . . . . . . . . . . . . .
. . .
84
5.7
Partial Evaluation of Floating Capacitive Coupling . . . . . . .
. . . .
86
5.8
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . .
90
5.6.1
6 Detecting Region Changes
91
6.1 MomvsRsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
91
.
6.2 Overview of Root Finding . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
6.4 Finding Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 One Pole . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
.
.
94
.
.
95
. . . .
95
. . . .
96
6.5 Root Polishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
100
6.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103
6.4.2 Two Poles . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.3 Three Poles . . . . . . . . . . . . . . . . . . . . . . . .
102
105
7 Evaluation
7.1 Extending Switch-Level Simulation . . . . . . . . . . . . . . . . . . . .
7.1.1 CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
106
7.1.2 ECL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 . 1 . 3 BiCMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
. . . . . . . . . . .
120
7.2 Performance Compared to Switch-Level Simulation.
7.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
106
115
126
8 Conclusion
127
A MOS Level-l Threshold
132
B MOS Level-l Polytopes
134
C Linearization of Bipolar Transistor Capacitances
136
D Optimal Frequency Scaling
139
D.l Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D. 1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . .
139
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
142
D.2 Efficiency
140
E Unstable Waveforms
143
BibliograDhv
144
vii
List of Tables
4.1
Execution Time Required to Generate Waveform Approximations. . . . .
54
4.2
Benchmark Statistics.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.1
Decrease in Efficiency with Increasing Capacitor Levels. . . . . . . . . .
88
6.1
Voltage Error at Estimated Root of Boundary Waveform. . . . . . . . . .
101
6.2
‘lime Error of Boundary Waveform Root Estimate.
. . . . . . . . . . . .
101
6.3
Number of Iterations for Root Polishing.
. . . . . . . . . . . . . . . . .
102
6.4
Average Number of Cycles for Root Finding. . . . . . . . . . . . . . . .
102
6.5
Pole Configurations for Root Finding. . . . . . . . . . . . . . . . . . . .
103
7.1
Execution Time of Example Circuits (seconds). . . . . . . . . . . . . . .
108
7.2
Simulator Performance on Ring Oscillators. . . . . . . . . . . . . . . . .
12 1
7.3
Mom Execution Profiles for Various Benchmark Circuits. . . . . . . . . .
123
7.4
Mom’s Simulation Statistics for Ring Oscillators. . . . . . . . . . . . . .
124
...
Vlll
List of Figures
1.1
1.2
1.3
1.4
Simulation Hierarchy . . . . . . . . . . . . . . . . . . . . . . , . . . .
Circuit Level Representation . . . . . . . . . . . . . . . . . . . . . . . .
Gate Level Representation . . . . . . . . . . . . . . . . . . . . . . . . .
Resistive Switch Representation . . . . . . . . . . . . . . . . . . . . . .
1
2
3
4
12
2.2
Node Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . .
Voltage Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
RCTree
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Switched Resistor Model . . . . . . . . . . . . . . . . . . . . . . .
14
16
2.6
Decomposition of Circuit into Clusters. . . . . . . . . . . . . . . . .
Rsim’s Event Driven Simulation Algorithm. . . . . . . . . . . . . . .
2.7
Multiple Segments per Logical Transition. . . . . . . . . . . . . . . .
20
3.1
23
3.2
Switched Resistor Model. . . . . . . . . . . . . . . . . . . . . . . .
Hyperplane Subdivides Space into Regions of Operation. . . . . . . .
3.3
Response of Level-O Inverter Compared to SPICE.
. . . . . . . . . .
26
3.4
MOS I-V Characteristics With and Without Velocity Saturation . . . .
Linearization of Transconductance for Large C/g3 . . . . . . . . . . .
28
29
3.7
Piecewise Linear MOS Model: Saturated Region . . . . . . . . . . .
Piecewise Linear MOS Model: Linear Region. . . . . . . . . . . . .
3.8
Piecewise Linear MOS Model: Off Region. . . . . . . . . . . . . .
30
3.9
Piecewise Linear vs SPICE I-V Characteristics: Q, = 3,4, and 5 volts,
SPICE Level-2 models for MOSIS 211 Process. . . . . . . . . . . . . . .
31
2.1
2.4
2.5
3.5
3.6
ix
12
15
16
23
29
30
3.10 Piecewise Linear vs SPICE I-V Characteristics: SPICE Level-3 models for
MOSIS 1.2p Process.
3.11 CMOS Inverter
. . . . . . . . . . . . . . . . . . . . . . . . . . .
32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.12 Inverters Using Piecewise Linear vs SPICE Transistors.
. . . . . . . . .
33
3.13 Mismatch for Small V;, . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.14 Slow Input to CMOS Gate Accentuates Modeling Errors. . . . . . . . . .
34
3.15 NMOS Stacks Using SPICE, Piecewise Linear, and Pseudo-Linear models.
35
3.16 SRAM Sense Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.17 Simulated Response of SRAM Sense Amplifier. . . . . . . . . . . . . . .
36
3.18 Piecewise Linear Bipolar Model.
. . . . . . . . . . . . . . . . . . . . .
37
3.19 ECL Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.20 ECLANDGate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2 1 CMOS and ECL Test.Circuits . . . . . . . . . . . . . . . . . . . . . . .
39
3.22 Grounded Capacitor Approximations. . . . . . . . . . . . . . . . . . . .
40
3.23 Level- 1 Bipolar Model has Floating Capacitors. . . . . . . . . . . . . . .
41
3.24 Level-2 Bipolar Model has Floating Capacitors and Parasitic Resistors . .
42
4.1
Replacing Capacitors and Inductors to Compute (k+l)st Moments. . . . .
48
4.2
Circuit with Low Order Moments Equal to Zero.
. . . . . . . . . . . . .
50
4.3
Schmitt Trigger
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.4
2 Input CMOS NAND Gate. . . . . . . . . . . . . . . . . . . . . . . . .
55
4.5
Large Signal Swing for CMOS NAND Gate.
. . . . . . . . . . . . . . .
56
4.6
CMOS Level-O Ring: SPICE vs PWL vs Mom. . . . . . . . . . . . . . .
59
4.7
CMOS Level-l Ring:‘SPICE vs PWL vs Mom. . . . . . . . . . . . . . .
60
4.8
ECL Level-O Ring: SPICE vs PWL vs Mom. . . . . . . . . . . . . . . .
61
4.9
ECL Level-l & -2 Rings: SPICE vs PWL vs Mom. . . . . . . . . . . . .
62
4.10 Simulation of Schmitt Trigger . . . . . . . . . . . . . . . . . . . . . . .
4.11 Schmitt Trigger using 3 Pole Response for emir . . . . . . . . . . . . . .
64
4.12 Cascade of ECL Inverters . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.13 Initial Segment in the Response of ECL Cascade. . . . . . . . . . . . . .
65
Leaky resistor tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.1
X
64
5.2
Computing moments for leaky tree. . . . . . . . . . . . . . . . . . .
.
71
5.3
Norton calculations. . . . . . . . . . . . . . . . . . . . . . . . . . .
.
72
5.4
Transistor-capacitor tree. . . . . . . . . . . . . . . . . . . . . . . . .
.
74
5.5
Three terminal network model.
. . . . . . . . . . . . . . . . . . . .
.
74
5.6
Circuit interpretation of admittance parameters. . . . . . . . . . . . .
.
74
5.7
Floating capacitor connecting same cluster. . . . . . . . . . . . . . .
.
76
5.8
Capacitive Coupling Between Clusters. . . . . . . . . . . . . . . . .
.
77
5.9
Gate to Channel Coupling. . . . . . . . . . . . . . . . . . . . . . . .
.
77
5.10 Multiple Couplings for Bipolar Transistor. . . . . . . . . . . . . . .
78
5.11 Topological Sort of ECL Gate. . . . . . . . . . . . . . . . . . . . .
79
5.12 Resistor Model for Bipolar Transistor Results in Loops in ECL Gates
80
5.13 Schmitt Trigger
. . . . . . . . . . . . . . . . . . . . . . . . . . .
81
5.14 Diode Decoder.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
81
5.15 Circuit Partitioning via Branch Tearing. . . . . . . . . . . . . . . .
82
5.16 Circuit Partitioning via Node Tearing. . . . . . . . . . . . . . . . .
83
5.17 CircuitThatisNearlyaLeakyTree. . . . . . . . . . . . . . . . . .
84
5.18 Branch Tearing of Nearly Leaky Tree. . . . . . . . . . . . . . . . .
84
5.19 NodeTearingofNearlyLeakyTree.
. . . . . . . . . . . . . . . . .
85
5.20 Tearing of feedback. . . . . . . . . . . . . . . . . . . . . . . . . .
86
5.2 1 Clusters Coupled by Floating Capacitors.
. . . . . . . . . . . . . .
87
. . . . . . . . . . . . . . .
89
6.1 Stationary Points . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
6.2 Rootfinding using Piecewise Quadratic Segments . . . . . . . . . .
99
5.22 Limited Levels of Floating Capacitors.
7.1 Sense Amplifier for Static RAM.
. . . . . . . . . . . . . . . . . . .
107
7.2 Dynamic RAM Cell and Sense Amplifier . . . . . . . . . . . . . . .
109
7.3 DRAM Bit Lines: Read followed by Precharge . . . . . . . . . . . .
109
7.4 ECL Current Steering Networks. . . . . . . . . . . . . . . . . . . . .
110
7.5 Resistor Tee as ECL Load. . . . . . . . . . . . . . . . . . . . . . . .
111
7.6 Diode Decoder.
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
111
7.7 Diode Decoder Response. . . . . . . . . . . . . . . . . . . . . . . .
112
xi
7.8
Emitter Degeneration Resistors to Cause Current Sharing. . . . . . . . . .
112
7.9
Schottky Clamped ECL RAM. . . . . . . . . . . . . . . . . . . . . . . .
113
7.10 ECL RAM Cell Internal Nodes. . . . . . . . . . . . . . . . . . . . . . .
114
7.11 ECLRAMRead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
115
7.12 BiCMOS Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
116
7.13 BiNMOS Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117
7.14 BiCMOS RAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
118
7.15 BiCMOS RAM Response. . . . . . . . . . . . . . . . . . . . . . . . . .
119
8.1
Simulation Space
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
128
8.2
Numerical Integration Approximation . . . . . . . . . . . . . . . . . . .
129
8.3
Moments Matching Approximation . . . . . . . . . . . . . . . . . . . .
130
A.1
Piecewise Linear MOS Model: Saturated Region . . . . . . . . . . . . .
132
A.2 Staircase Response from Source Follower . . . . . . . . . . . . . . . . .
133
B.l
Polytopes of MOS Level-l model . . . . . . . . . . . . . . . . . . . . .
134
c.1
Nonlinear Capacitances in SPICE bjt model . . . . . . . . . . . . . . . .
136
. . . . . . . . . . . . . . . . . . . . .
138
. . . . . . . . . . . . . . . . . . . . . . . . . . .
140
C.2 Linearized Base Charge Model
D.l
Difference Functions
xii
Chapter 1
Introduction
1.1
Verification of Large Digital Designs
Over the past few decades the number of transistors it is possible to incorporate into a
single digital integrated circuit (IC) has risen at a breathtaking pace. Unfortunately, as the
complexity of integrated circuits increases, so does the likelihood of design errors and the
difficulty of detecting and identifying those errors. Consequently, designers have become
dependent upon simulation programs to predict the behavior of ICs before they are actually
fabricated. These programs make it possible to verify that an IC conforms to logical and
timing specifications before committing the vast resources necessary to build it.
In order to deal with the staggering complexity of designs consisting of hundreds
of thousands of transistors, a hierarchy of simulation tools and techniques has evolved
(Figure 1.1).
In general, lower levels of simulation utilize more detailed descriptions of
Increasing
accuracy-
Functional level
Gate level
l-l
e
Switch level
Circuit level
Device level
Figure 1.1: Simulation Hierarchy
1
increasing
speed
C’HAP7ER 1. INlXODUcTION
2
the design and provide greater accuracy and flexibility. However, this increased accuracy
is usually achieved at the expense of decreased efficiency, limiting the size of designs it
is possible to simulate. In contrast, higher level simulators utilize simplified higher level
models to represent the behavior of collections of lower level objects. Because higher level
models abstract away many lower level details, higher level simulators can simulate larger
designs more efficiently. However, the assumptions made in formulating the higher level
models often compromise their accuracy and flexibility. Thus, tradeoffs exist for every
level of simulation. The result is that the design process usually includes simulation at
multiple different levels.
Because the focus of this thesis is switch-level simulation we will narrow our discussion
to just the three middle levels of Figure 1.1 . Immediately below the switch-level is circuit
level simulation. Circuit simulators[Nag75, WIM+73] represent the IC as a network of
lumped, possibly nonlinear, transistors, resistors, inductors, capacitors, and current and
voltage sources (Figure 1.2). Kirchoff’s voltage and current laws are used to formulate a
Figure 1.2: Circuit Level Representation
set of coupled nonlinear differential equations describing the behavior of the network, and
numerical integration is used to solve these equations. The advantages of circuit simulators
are their flexibility and accuracy. They can deal with arbitrary circuit topologies, they
employ general non-linear transistor models, and they can generate detailed descriptions
of the time behavior of any voltage or current in the network. Their disadvantage is that
they are slow. Circuit simulation programs running on contemporary workstations can
only simulate approximately 1 logic transition of a logic gate in a second. This speed is
inadequate considering that large digital designs can consist of ten to a hundred thousand
CHAPTER 1. ZN7-RODCJC77ON
3
logic gates.
In contrast, gate level simulators represent the IC as a network of gates (Figure 1.3). A
D=E
D
Figure 1.3: Gate Level Representation
gate is a higher level object that represents a collection of transistors. Each gate interacts
with the rest of the circuit via a set of unidirectional input and output terminals. A gate
observes the state of wires attached to its inputs, and sets the state of wires attached to its
output. Wire states are represented by Boolean values, and a gate’s behavior is modeled by
a Boolean function that determines the output value as a function of the input values. The
primary advantage of gate level simulation is efficiency. Logic simulation programs running
on contemporary workstations can simulate up to a million logic transitions of a logic gate
in a second. However, there are disadvantages. First of all, the circuit must be partitioned
into a number of pre-characterized gates that exhibit unidirectional behavior. While this is
readily done for gate arrays and standard cell designs (after all, these designs are composed
from gates selected from libraries) custom designs often contain structures (for example,
the MOS pass transistor structure) whose behavior is bidirectional and consequently is not
readily modeled by the gate abstraction. Secondly, the characterization of the logical and
timing behavior of gates is usually performed manually and can be time consuming and
error prone.
Switch-level simulation[Bry80, Ter83, RT85b, DvGdG85, Sch85] is a relatively recent
innovation which attempts to strike a balance between gate and circuit level simulation.
The circuit is described as a network of transistors that are simply modeled by voltage
controlled switches. Depending upon the particular approach, each switch has associated
with it either a srrengrh[Bry80] or a resistunce[Ter83, RT85b] representing the current
driving capabilities of the transistor (Figure 1.4). Because the circuit isn’t partitioned into
unidirectional gates, switch level simulators eliminate the pre-characterization step’ and can
simulate a wider variety of circuits than gate level simulators (including those exhibiting
‘Instead of pre-characterizing every different logic gate the user only needs to pre-characterize the two
different kinds of transistors: NMOS and PMOS.
4
CZZAZ’7’ER 1. ZKZ’ROZXJC’TZON
\T
Figure 1.4: Resistive Switch Representation
bidirectional behavior). Simultaneously, the simplified circuit models allow switch-level
simulators to be more than three orders of magnitude faster than circuit simulators.
Of course, limitations can arise from the use of overly simplistic transistor models.
The switch model was initially developed for the simulation of MOS digital circuits, and
works well for the static MOS logic which makes up most of a typical digital MOS
IC. Occasionally, however, there are small portions of an IC whose behavior is not well
modeled by a network of switches. Typically, these portions must be simulated at the
circuit level, thus complicating the verification of the design. Furthermore, resistive or
multi-strength switches are poor models for the behavior of bipolar transistors in ECL
circuits. Although switch-level models have been extended to allow the simulation of
bipolar transistors[HS87, SYH88, KAHS88], real ECL and BiCMOS designs occasionally
include circuit techniques (for example, diode decoders, leaker resistors, current source
sharing) that foil approaches based upon classical ECL current steering trees.
To address these shortcomings, this thesis attempts to extend the capabilities of switchlevel simulation. Noting that the switched resistor model is just a particularpiecewise linear
model, we investigate the incorporation of more general piecewise linear transistor models
into the switch-level framework.2 Several considerations motivate the use of piecewise
*Pillage[Pil89] suggested the incorporation of piecewise linear models and moment analysis into a circuit
simulator as a promising future application of his work. Our work differs in emphasis. We neglect the
general nonlinear models and circuit topologies necessary to achieve the accuracy of circuit simulation and
study instead simple models and restricted topologies in an attempt to match the efficiency of switch-level
simulation.
CHAZ’TER 1. ZN’TRODUCTZON
5
linear models. First, we would like to incorporate more sophisticated MOS models in order
to simulate the behavior of MOS circuits that cannot be simulated by the use of switched
resistors (for example RAM sense amplifiers). Second, we would like to be able to simulate
bipolar transistors which are strongly nonlinear but which seem to be adequately described
by fairly simple piecewise linear models[KAHS88]. Meanwhile, we would like to give
up as little efficiency as possible. Switch-level simulation has proven itself useful for
simulating the large majority of MOS digital circuits and it would be best if we could pay
for additional generality for only those portions of the circuit where it was needed. To
this end the simulator provides the user with a selection of transistor models of which the
switched resistor model is one choice. It turns out that the RC tree analysis techniques can
be generalized to efficiently handle trees of our more general piecewise linear devices with
only a modest degradation of efficiency. The resulting simulator, Mom, is a mixed mode
simulator that extends the capabilities of switch-level simulation in the direction of circuit
simulation.
1.2
Organization
The next chapter describes previous work in estimating the transient response of digital
circuits. In particular the approach taken by circuit simulators (e.g. SPICE) is compared
with that taken by switch-level simulators (e.g. Rsim). Although much work has gone into
trying to speed up circuit simulation, we approach the problem from a different perspective.
That is, rather than starting with a simulator that is accurate and trying to improve its
efficiency, we start with a simulator that is efficient and try to improve its accuracy.
The efficiency and flexibility of Mom are strongly dependent upon the choice of piecewise linear models. Chapter 3 discusses restrictions that are placed upon the models to
preserve efficiency. It also explores the utility of simple piecewise linear models and
demonstrates that even simple models can greatly extend the capabilities of switch-level
simulation.
Networks containing piecewise linear models may not have responses that are well approximated by a single exponential. Therefore a more general moments matching procedure
is used in place of Rsim’s single time constant delay estimation. Chapter 4 summarizes our
CZ%WZER I. ZNTRODUCTZON
6
experience with moments matching waveform approximation. Although the procedure has
been extensively explored by others, there are a number of practical considerations unique
to our application.
Chapter 5 describes extensions to RC tree analysis that allow it to handle piecewise
linear models. It is demonstrated that as long as the topology of the transistors is a tree, and
as long as there is no feedback, the complexity of the analysis remains O(n). It turns out
that most MOS and ECL circuits meet these restrictions. However, the analysis can also
be extended to handle non-tree topologies and feedback. Although the extensions are not
as efficient they only need to be used for those portions of the circuit that don’t meet the
restrictions.
The introduction of piecewise linear models greatly complicates the task of detecting
when devices switch. Ultimately, the problem boils down to finding the smallest, positive
root of a multiple-pole exponential waveform. Chapter 6 discusses the techniques used to
solve this problem efficiently.
Finally, Chapter 7 demonstrates the utility of Mom on a number of small CMOS,
ECL, and BiCMOS circuits. Additionally, its efficiency is compared with that of existing
switch-level simulators.
Chapter 2
Previous Work in Transient Estimation
Many different approaches have been proposed for estimating the transient response (and
hence delay) of digital circuits. Here, we will review and contrast two prevalent approaches.
One approach is exemplified by circuit and timing simulators. This approach is characterized by the use of nonlinear device models and incremental time numerical integration. A
second approach has been taken by some MOS timing analyzers and switch-level simulators. This approach employs linear device models and moment analysis.
2.1
Circuit and Timing Simulation
Circuit and timing simulation have evolved continuously over the past few decades. The
“second generation”’simulators[ WJM+73, Nag751 reached maturity during the mid 1970’s.
These simulators have since achieved widespread acceptance and are now regarded as the
classic “circuit simulators”. However, as ICs increased in size, the circuit simulators were
found to require excessive amounts of computer memory and time. Consequently, a “third
generation” of simulators emerged which attempted to accelerate the transient simulation
of large digital ICs.
* Hachtel and Sangiovanni-Vincentelli[HSVS 11 have found it to be convenient to distinguish between three
generations of simulators.
7
CHAPTER 2. PZ?EVZOUS WORK IN TRANSZENT ESTIMATION
8
2.1.1 Circuit Simulation
Circuit simulators represent the IC as a network of lumped, possibly nonlinear transistors,
resistors, inductors, capacitors, and current and voltage sources. The equations relating the
terminal voltages and currents of the network elements are combined with Kirchoff’s voltage
and current laws to produce a set of coupled nonlinear differential equations describing the
electrical behavior of the network. These equations can be written:
f(x(t). x’(f), t) = 0
xd?;
(2.1)
f() : !Rn x R” x 32 + !Rn
where x(t) is a (time varying) vector of network variables (voltages and currents), x’(t)
is its time derivative, and t is time. The transient response of the network is simply the
solution of these equations for t > 0 subject to the initial conditions: x(t = 0) = ~0.
Incremental time numerical integration is used to solve the equations. The procedure
involves advancing time in steps:
tk+l =
tk +
to = 0
hk,
(2.2)
(hk is the size of the Pth step) and computing the response at each step. That is at each time
step a linear multistep integration formula of the general form2
P
P
‘Cfk+l) = C aix(tk-t) +
i=O
hk C bjr’(tk-j)
(2.3
3=-l
is used to eliminate x’(t) from Equation (2.1). This yields a system of coupled nonlinear
algebraic equations,
&$k+l)) = 0
(2.4)
which is solved using Newton-Ruphson iteration. That is, starting from an initial guess:
X0( tk+l) = x(tk), successively improved estimates are computed:
f o r i=0,1,2,3...begin
(2.5)
end
(2.6)
*The coefficients, a, and b,, are chosen to ensure that the formula is satisfied exactly if .c( t) is a low order
polynomial.
CZ-ZAZ?ZER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
9
(g’() is the Jacobian of g( )) until some convergence criterion is met:
l~x’+l(tk+l) - xi(tk+l)li < 6
(2.7)
(c is some error tolerance). The time step, hk is carefully chosen to control the local
truncation error of the integration formula (2.3). Time step selection involves a compromise
because smaller step sizes decrease both the error and simulation efficiency.
Flexibility and accuracy are the primary advantages of circuit simulation. The equation
formulation places few restrictions on the network’s topology, and the ability to handle
nonlinear network elements allows accurate transistor models. Furthermore, the numerical
integration procedure allows the computation of the detailed time step by time step behavior
of any electrical variable in the circuit. The accuracy of the integration algorithms is limited
only by numerical considerations which are almost always insignificant compared to the
precision with which components can be fabricated on an IC.
The disadvantage of circuit simulation is the inefficiency that results from processing the
entire circuit simultaneously. At each time step circuit simulators compute multiple NewtonRaphson iterations, each of which requires the formulation and inversion of the Jacobian.
However, the inversion of the Jacobian of an entire IC can be prohibitively expensive. Even
using sparse techniques, the inversion of circuit matrices has been empirically observed
to grow superlinearly (for example, O(n’.‘)[Kun86]) with the circuit size. Furthermore,
because a single time step is chosen for the entire circuit, the step size is necessarily limited
by the accuracy requirements of the fastest moving subcircuit. Thus tiny time steps must
be taken for the entire IC if a single gate is switching rapidly, even if nothing else in the
rest of the IC is switching at all!
2.1.2
Acceleration of Circuit Simulation
To address these deficiencies a third generation of simulators emerged which attempted
to accelerate the transient simulation of large digital ICs. The almost universal theme in
these simulators was the use of decomposition techniques to partition the IC into smaller
pieces that could be analyzed independently[DMHH87]. Partitioning achieved several
things. First, it allowed the formulation and analysis of much more moderately sized
systems of equations. Second, it open up the possibility of multirate simulation, that is the
-R 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
10
selection of different time steps for different portions of the IC. Third, it became possible to
bypass completely the analysis of latent subcircuits, that is subcircuits that weren’t actively
switching. We will describe three common techniques that were used to decompose the
circuit equations (2.4): circuit tearing, relaxation, and forward Euler integration.
Circuit Tearing
Macro[RSVH79] and Slate[YI-ITSO] employed circuit tearing techniques to reduce the
Jacobian to bordered block diagonal form. Once in this form the system of equations could
be solved in two steps. First, each of the blocks was solved independently. Second, the
overall solution was assembled from the individual solutions. However, only the non-latent
blocks needed to be processed at any particular step. If the state of a block changed little
between the last two time steps or Newton Raphson iterations the block was declared latent
and the solution from the previous time step or Newton Raphson iteration was simply
reused. Thus, the needless reevaluation of subcircuits that were not changing was bypassed
much in the same way that SPICE bypassed devices[Nag75].
Relaxation
MOTIS[CGK75] was the first of the so called “timing simulators” that utilized restricted circuit models, nonlinear relaxation, and time advancement integration. When certain restrictions were placed on the circuit (including no inductors, no floating capacitors, a grounded
capacitor at each node3, unidirectional coupling from an MOS transistor’s gate to its source
and drain, and appropriate ordering of the nodes) the Jacobian became nearfy lower block
tiangular and, for sufficiently small step sizes, diagonally dominant. These characteristics
make it efficient to invert the Jacobian using a form of Gauss-Jacobi relaxation. Nonlinear
Gauss-Jacobi relaxation essentially decomposes the system of equations (2.4) into a set of
scalar equations by treating all non-diagonal entries of the Jacobian as if they were zero.
That is, starting with an initial guess, x’(tk+i ) = z(tk), each succeeding relaxation iterate,
xi+\ (tk+l ), is assembled by solving the jth equation, gj, for the jth component of x, xj,
‘A “floating” capacitor is a capacitor with neither terminal connected to ground or a power supply. A
“grounded” capacitor is a capacitor with at least one terminal connected to ground or a power supply.
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
11
assuming that all other components, (~1 : 1 # j} are fixed at their values from the previous
iteration.
for i = 0, 1,2,3, . . . begin
forj = 1,2,3,...nbegin
solve for ~i+‘(tk+i):
gj(zi(tk+l), zi(tk+*) ,...Zj-l(tk+l),rj+*(tk+*),2~+l(tk+l),...~~(tk+l)) = o
end
end
In principle, the innermost loop uses Newton-Raphson iteration to solve each of the
scalar nonlinear equations, and the outermost loop computes successive relaxation iterations
until the zi converge (Equation (2.7)). However, the time advancement algorithms utilized
only one relaxation step per time step and only one Newton-Raphson step per relaxation
step because that was shown to be sufficient to guarantee convergence. MOTIS pioneered
the use of time advancement integration algorithms and, in doing so, avoided both sparse
Gaussian elimination and Newton-Raphson iteration.
MOTIS was followed by a number of simulators that explored variations of the relaxation procedure. Event-driven techniques from logic simulation were used by SPLICE1
[New791 to 1) dynamically order the equations thereby achieving faster convergence and 2)
bypass the evaluation of latent nodes. Problems with reliability motivated the investigation
of alternatives to Gauss-Jacobi time advancement, including Guurs-Seidel[New79], and
Modified Symmetric Gauss-Seidei[DMNSV83] algorithms. Additionally, it was realized
that relaxation could be applied at different levels, including at the linear equation level
(MOTIS2[CS84]), the nonlinear equation level (SPLICE 1.6[Sal83]) and the waveform
level (Relax[LSV82]).
Forward Euler Integration
A different approach to decoupling Equation (2.4) was explored by Elogic[KKSN84] and
SPECS[dG84]. When certain restrictions were placed on the network (including no inductors, no floating capacitors, and a grounded capacitance at each node) Nodal Analysis yields
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
12
the formulation:
Cv’(t) = i(v(t))
vdRn;
(2.8)
CdR” x zR*;
i() : ZRi” + R”
where v(t) is a (time varying) vector of node voltages (measured with respect to ground),
v’(t) is its time &rivative, C is a diagonal matrix of node to ground capacitances, and
i() gives the currents injected into each node by the non-capacitive elements. Since C is
diagonal, it is trivially inverted, and Forward Euler integration is used to predict the value
of LJ at some time step, h,, in the future:
v(t,+,) = $c-‘;(r(t,))
n
(2.9)
Note that this formulation decouples the nodes. To predict the future voltage of a node, N,
(see Figure 2.1) it is necessary to compute the currents through only those devices directly
attached to LY
(~2
and r3) No matrix formulation or inversion is required. Furthermore,
Figure 2.1: Node Decoupling
time steps can be selected independently.
The approach taken by Elogic, SPECS, and WATSWITCH[RVB88] was to partition
voltage into a small number of disjoint ranges (Figure 2.2). Then the time step for a node
-
range 4
range 3
range 2
range 1
range 0
v
4V
3v
2V
Iv
ov
Figure 2.2: Voltage Ranges
was set equal to the amount of time it took for the node’s voltage to move from its present
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
13
value to just outside its present range. An event was scheduled for a node for the time
when its voltage crossed into an adjacent range. When this eventfired, the current through
all devices attached to the node were updated and new waveforms were computed for all
nodes connected to those devices. Thus, simulation proceeded on an event-driven basis
and the evaluation of latent nodes was bypassed.
Later versions of SPECS associated the voltage ranges with the devices rather than the
nodes and formalized the simulation in terms of piecewise linear voltages and piecewise
constant current device models[RV87]. Furthermore, extensions were made to include
floating capacitors and inductors[VFR90]. ADAPTS[SNGRF)l] further generalized the
approach by dynamically selecting each device’s voltage ranges (and, hence, the step size)
based upon an analytical model for the device and the accuracy requirements of the overail
simulation.
In general the third generation simulators achieved speed-ups of up to two orders of
magnitude over the classic circuit simulators. However, their restricted circuit models
and problems with reliability have impeded their widespread acceptance. Furthermore
their speedups, although impressive, are fundamentally limited by the use of numerical
integration. Time advancement numerical integration requires that time be advanced in
steps whose size is limited by the need to maintain accuracy and, in some cases, stability.
2.2
Moment Based Timing Analysis and Simulation
In spite of the large amount of work invested in trying to speed up circuit simulation a large
gap remained between the timing simulators and the gate level simulators. Consequently
in the early 1980’s a new form of simulator was devised to fill this gap, the switch-level
simulators. One of these, Rsim[Ter83, Hor83], took a fundamentally different approach to
transient estimation from the circuit and timing simulators. Instead of modeling the behavior
of devices using nonlinear models and computing the response of the networks using
numerical integration, Rsim modeled the behavior of devices using simple linear models
and computed the response of networks using moment analysis. Moment analysis has the
advantage over time advancement numerical integration that the response is computed once
for all time rather than at numerous points in time.
CHAPTER 2. PREVIOUS WORK IN TRANSZENT ESTIMATION
14
Moment analysis originated in the late 1940’s when Elmore[Elm48] utilized the first and
second moments of the impulse response of a linear amplifier to estimate its step response.
In general, the Icth moment of the (presumed causal) impulse response, h(t), is defined:
i?ik = J c0 t%(t)&
0
(2.10)
Elmore found that the quantities (6
A - Ei;i:) and fiil were good estimates of the step
A r722
response’s rise time and the delay to its 50% point, respectively. The delay estimate became
known as the Elmore delay.
Interest in the application of moment analysis to the modeling of delays in MOS digital
integrated circuits was sparked by Penfield and Rubinstein[PR81] who modeled the delay
of polysilicon interconnect by the step response of RC trees. An RC tree was defined to
be a tree of resistors with one grounded node and grounded capacitors at the other nodes
(Figure 2.3). RC trees were particularly interesting because interconnect usually took the
Figure 2.3: RC Tree
form of trees and because trees were easy to analyze. Penfield and Rubinstein described a
computationally efficient algorithm for computing the first moment of RC trees and derived
waveform bounds for the step response based upon the single time constant approximation.
Although this work was initially intended to model the delays of fineur interconnect, it
was soon used by a number of MOS timing analyzers[Put82, Jou83] to model the delays
of networks of nonlinear MOS transistors. Horowitz[Hor83] more carefully justified this
approach by deriving nonlinear one and two time constant waveform estimates and bounds.
He then retrofitted his nonlinear timing models into an existing switch-level simulator4,
4Although, this work has not been widely reported in the literature, it was performed as pm of his PhD
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
15
Rsim[Ter83]. Because this thesis is essentially an extension of Rsim we next describe in
greater detail the algorithms used by Rsim.
As mentioned in the introduction, Rsim models transistors with the simple resistive
switch consisting of the series combination of a resistor and a voltage controlled switch
(Figure 2.4).
The resistor models the current driving capabilities of the MOS transistor
“d
Figure 2.4: Switched Resistor Model
and is sized according to the width and length of the transistor. The switch is either on or
offand is controlled by the gate voltage measured with respect to ground. If (for an NMOS
transistor) the gate voltage is greater than half the power supply voltage then the switch is
closed (current flows). Otherwise the transistor is an open circuit. It is assumed that no
DC current flows into the gate and, aside from the gate’s control of the switch, there is no
coupling between the gate and the channel. As with the early third generation simulators,
floating capacitors are disallowed and modeled by “equivalent” capacitances to ground.
Replacing on transistors by resistors and ofltransistors by open circuits usually results
in a partition of the original circuit into a large number of small, mutually independent
subcircuits known as stages or clusters (Figure 2.5). Because clusters are decoupled, their
responses can be computed independently. Thus in order to compute the transient response
of the overall circuit it is only necessary to analyze those clusters that are actively switching
at each point in time. Rsim uses an event-driven simulation algorithm to schedule the
evaluation of active clusters thereby avoiding the analysis of latent clusters (Figure 2.6).
thesis work on MOS timing models. This improved version of Rsim became part of the standard Berkeley
CAD distribution.
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
16
(IJ0
cluster 1
I
cluster 2
D
C
I:
y
- I
-
Figure 2.5: Decomposition of Circuit into Clusters.
To compute the response of a cluster Rsim uses moment analysis instead of numerical
integration. Moment analysis is a two step process that involves, first, the computation of
moments from the network followed by the generation of waveform estimates from those
moments. Moments were computed using a procedure equivalent to finding successive DC
solutions for the network. Particular advantage was taken of the tree structure possessed by
most circuits. Sparse matrix formulation and sparse Gaussian elimination were bypassed in
1. Through the use of an event queue, select the next device to switch. Advance the
simulation time to the time of this event.
2. Construct the cluster. That is, collect all nodes affected by this switching event.
3. Compute the response of the cluster resulting from the switching event.
4. Reschedule all MOS transistors affected by this cluster. That is, examine every
transistor with a gate terminal attached to the cluster and schedule an event for it if
the new response causes ir to switch some time in the future.
Figure 2.6: Rsim’s Event Driven Simulation Algorithm.
favor of simpler tree analysis techniques whose complexity was guaranteed to be 0( tz) in
the size of the cluster. In the rare case that non-tree topologies were encountered, heuristics’
were used to simply delete resistors closing loops in order obtain an approximate solution.
% was noted that the most commonly occurring case of loops was created by CMOS transmission gates.
These were simply handled by parallel resistor combination.
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
17
In his thesis Horowitz showed that waveform estimates for linear RC networks, could
be obtained from the moments by assuming that the system function was of the general
form6
(2.11)
W) = (1 :,s,
for single time constant estimates, or:
(1 + ST,)
w = (1 + sq)(l +
(2.12)
s72)
for two time constant estimates. Then, the parameters, ~1, ~2, and r,, were obtained by
matching (among other things) the low order moments of the system function with those
computed from the circuit. In the time domain, these one and two time constant estimates
took the forms:
e-t/rl
v(t) =
v(t) =
(
and
7*-q ) e -1/T1+(T2--Tz ,-mz)
72-1
(2.13)
(2.14)
respectively.
However, MOS transistors are nonlinear. Horowitz showed that the single time constant
estimate of the response of a nonlinear NMOS network was:
v(t) falling
v(t) rising
(2.15)
where ri was the first moment of the network obtained by replacing each transistor by a
resistor of resistance:
R
2
eff = k(&d - vg
(2.16)
(where k = PC,, W/L is the device transconductance parameter and L, W, ~1, Co,, and&
are parameters of the quadratic MOS model[MK77, HJ83]).
However, switch-level simulators do not require the computation of the waveform but
rather only the delay to the 50% point. For linear networks, the delay can be computed
6For simplicity Horowitz normaliztxi the logic swing (usually 5 voh..s) to 1 volt.
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
18
7
by multiplying the first moment by - ln( l/2). For NMOS networks the delay can be
computed by multiplying the first moment by tanh-t (l/2) for falling transitions and 1 for
rising transitions. Thus although the step response of MOS-capacitor trees is not identical
to the step response of resistor-capacitor trees, the computation of single time constant delay
estimates of MOS trees could be made identical in form to that of resistor trees by choosing
the resistor values appropriately. It was precisely this observation that justified the use of
the switched resistor model. Unfortunately, Horowitz found no corresponding relationship
between linear and nonlinear two time constant delay estimates.
Numerous extensions to the approach of Penfield, Rubinstein, and Horowitz have
been suggested. A number of researchers have investigated the extension of linear moment analysis to circuits more general than RC trees, including RC trees with multiple
sources[Chu88, l?T85a], RC meshes[Wya85, LM84], and floating capacitors and controlled sources[SZ87]. Also explored was the use of higher order estimates to model
the non-monotonic waveforms arising from linear[RT85a] and nonlinear[Chu88] charge
sharing.
However, many of the extensions to linear moment analysis were superceded by the
recent discovery[Hua90, Cha91 J that single time constant delay estimation was just a
special case of the more general muments matching procedure developed to solve the
model order reduction problem of linear control theory. In 1956 Paynter[Pay56] applied
the Pade approximation to the approximation of system functions. That approach constructs
approximations of arbitrary order by matching low order moments. Pillage, Rohrer, and
Huang[PR90, Hua90] combined general Pade approximation with standard circuit equation
formulation and analysis techniques from circuit simulation to generate arbitrarily high
order estimates of the responses of general lumped linear networks. They demonstrated the
application of those techniques to the estimation of the responses of linear interconnect and
the estimation of the poles and zeros of linearized models of operational amplifiers.
7For the linear single time constant approximation the result of matching the first moment of Equation (2.11)
is q = 6,.
CHAP’IER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
19
2.3 Improving Rsim
Switch-level simulation has its limitations. A common scenario is that the simulator can
simulate 99.9% of a large circuit, although for small portions the simulator fails to produce
even a correct logical result. Sometimes those failures don’t interfere with the verification of
the circuit. For example, the output of a voltage reference generator can be manually fixed
in a switch-level simulation after being verified using SPICE. Unfortunately the simulator’s
failure sometimes hinders the verification of the design. For example, Rsim’s inability to
deal with sense amplifiers makes it difficult to check the logical correctness of RAMS.*
However, most of such a circuit can be simulated at the switch-level. If it were possible to
increase the generality of the simulator just for certain small portions of the circuit it would
be possible to validate the entire design.
Piecewise linear models promise to give the user the ability to select different accuracies
for different parts of the circuit.’ For the RAM described above only small portions need to
be simulated with more accurate models while the majority of the circuit can be simulated
using switch-level models. In principle, if a simulator were designed such that the additional
complexity was paid for only when it was used, it would be possible to simulate those circuits
with only a moderate impact on the overall efficiency. Since the switched resistor model is a
piecewise linear model, it appears promising to simply extend Rsim’s simulation framework
to allow more general piecewise linear models.
Although Rsim’s basic simulation framework can be retained, extensive changes are
required. One change involves the representation of node state. Rsim takes advantage of
the fact that most digital MOS gates have logic swings from one power supply rail to the
other and switching thresholds at the midpoint. Therefore Rsim represents the state of nodes
using the Boolean values 0 and I and describes state transitions using just the delay and
slope at the 50% point. However, because Mom must simulate a wider variety of circuits,
it can make fewer assumptions about their properties. For example, ECL circuits have
multiple nonoverlapping voltage swings which make it impossible to establish a one to one
‘Although each of the individual pieces of a RAM is usually verified using a circuit simulator, it is still
useful to verify the logical functionality of the entire RAM using a switch-level simulator in order to contirm
that the decoders have been hooked up properly, that the data hasn’t been inadvertently inverted, etc.
‘The present version of the simulator depends upon the user to manually choose transistor models.
Although it may be possible to automate the selection of models, this wasn’t explored.
CHAZ’lER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
20
correspondence between a logical value and a voltage. To avoid building in assumptions
about particular logic families, Mom utilizes real voltages to represent the state of nodes
and waveforms to describe the shape of transitions.
Although the switched-resistor model is a piecewise linear model, more general piecewise linear models have attributes which necessitate extending the switch-level framework.
First, while the switched-resistor model only has two regions of linearity, a more general
piecewise linear model can have any number of regions. Devices with more regions are
more difficult to schedule. They also generate additional events which cause logic transitions to be made up of multiple segments instead of just a single segment (Figure 2.7).
Second, while the linear circuit describing the behavior of the switched-resistor model
Figure 2.7: Multiple Segments per Logical Transition.
(when it is on) is just a resistor, more general piecewise linear models can have more complex circuit models which include voltage, current, and dependent sources. More complex
circuit models complicate the estimation of a circuit’s response. Not only are the waveform estimates more complex, but the procedures for computing moments require more
sophistication.
Unfortunately these changes degrade the efficiency of the simulator. Simpler models
yield more efficient simulations, and there are strong incentives to use models that are
as simple as possible. The next chapter considers the constraints that must be placed
upon piecewise linear models in order to preserve efficiency. Additionally it explores the
capabilities of simple piecewise linear models.
2.4 Comparison and Summary
Circuit simulation has proven to be the most general and reliable technique for estimating the
transient response of digital circuits. Circuit simulation places few restrictions on the circuit
CHAPTER 2. PREVIOUS WORK IN TRANSIENT ESTIMATION
21
topology, utilizes general nonlinear transistor models, and employs time advancement
numerical integration to solve the circuit equations. The advantages of circuit simulation
are its accuracy and generality. The disadvantage of circuit simulation is its inefficiency. The
general models and topologies require algorithms whose execution time grows superlinearly
with the circuit size, making their use impractical for large ICs.
The classic circuit simulators were followed by a third generation of simulators which
focused on the acceleration of the transient simulation of large ICs. These approaches
were based upon the use of simplified circuit models and the decomposition of the circuit
into pieces that could be analyzed independently. Decomposition accelerates simulation
by reducing the size of the systems to be solved, by allowing the independent selection
of time steps (multirate) and by bypassing sections of the circuit that are not actively
switching (latency). Speedups of up to two orders of magnitude were achieved over classic
circuit simulation. However, these speedups are limited because the numerical integration
algorithms advance time in steps limited in size by the need to maintain accuracy.
MOS switch-level simulators such as Rsim also use decomposition techniques. However, instead of accurate nonlinear device models and numerical integration, simple linear
device models and moment analysis are used to predict the response of circuits. Moment
I
analysis has the fundamental advantage that it eliminates the need to take time steps; the
response is computed once for all time. The primary advantage of switch-level simulators
is their efficiency. Speedups over circuit simulation of more than three orders of magnitude
have been observed. Furthermore, because they restrict the topology of networks to trees,
switch-level algorithms have complexities which grow linearly (O(n)) with the size of the
circuit, making them suitable for the simulation of large ICs. The disadvantage of switchlevel simulators is their inflexibility. Simple switched resistor models are unsuitable for
some MOS digital circuits, and for most ECL and BiCMOS circuits.
The conjecture explored by this thesis is that the limitations of switch level simulation
can be overcome by allowing more general piecewise linear models. The incorporation of
piecewise linear models requires many changes to Rsim’s simulation framework and these
are explored in the following chapters.
Chapter 3
Piecewise Linear Models
Mom is an extension of Rsim that allows more general piecewise linear transistor models.
In principle, a simulator that utilizes piecewise linear models should be able to achieve
simulations of arbitrary accuracy because piecewise linear models can be made to conform
to nonlinear device characteristics with arbitrary precision by simply adding regions of linearity. However, as pointed out in the preceding chapter, efficiency concerns provide strong
incentives to use models that are as simple as possible. Therefore, after a brief discussion
of the representation of piecewise linear models, this chapter describes restrictions placed
on the models in order to preserve the efficiency of simulation. Such restrictions do not
appear to be a problem. A number of simple MOS and bipolar models are proposed and
simulations are used to demonstrate their capabilities. Even models that are just slightly
more complex than the switched resistor model can significantly increase the capabilities
of the simulator.
3.1 Piecewise Linear Representation
We represent a piecewise linear device by a collection of linear circuits, each of which
represents the linearized behavior of the device for a particular region of operation. Each
region of operation is represented by a polytope[vB87] in the multi-dimensional space
defined by the device’s terminal voltages. For example, the piecewise linear description
of the switched resistor model is depicted in Figures 3.1 and 3.2. The electrical behavior
22
CHAPTER 3. PIECEWISE LINEAR MODELS
(a)on:
Vg
23
>
Vt
(b)off:
Vg
<
Vt
Figure 3.1: Switched Resistor Model.
of the device in each of its two regions is modeled by the circuits in Figures 3.1 (a) and
(b). The regions themselves are described by polytopes in the three dimensional space
defined by the source, drain, and gate voltages (Figure 3.2). The region to the right of the
Figure 3.2: Hyperplane Subdivides Space into Regions of Operation.
cross-hatched plane labeled “V, = K” is the polytope corresponding to the on region. The
region to the left of the plane is the polytope corresponding to the offregion.
More general models may have circuits consisting of interconnections of linear circuit
elements. Additionally, they may have more than two regions of linearity. Examples of
more general circuits will be given in the following sections. The remainder of this section
discusses how regions of linearity are specified.
In general a device may have n terminals. Consider the n dimensional space defined by
the voltages at those terminals: { ~1, y, . . . v~}. Then the set of points whose coordinates
satisfy a given linear equation in those voltages:
a0 + Ulzll +
u2u2 + . . . + a,v, = 0
(3.1)
CHAPTER 3. PIECEWISE LINEAR MODELS
24
defines a hyperplune in the n dimensional space. The hyperplane is simply the multidimensional generalization of the familiar three dimensional plane. Like planes, hyperplanes partition space. Points that lie on one side of the hyperplane have coordinates that
satisfy the inequality:
uo+alul +ap2 t .a* +a,v, co
(3.2)
while points on the opposite side satisfy:
a0 + UlUl t a2l.9 t . . * t unun > 0.
(3.3)
The pofytope is the multi-dimensional generalization of the polyhedron. While a polyhedron is a region in three dimensional space bounded by planes, a polytope is a region
in n dimensional space bounded by hyperplanes. Equations (3.2) and (3.3) suggest that a
polytope can be specified by a conjunction of linear inequalities:
uotalul+u2u2t...ta,e', > 0
b0-t
b1qt b2v2t.
c4)tclz'l
-t&v,
tc2u2t...tc,u,
.
..
> 0
>
0
.
..
.
..
(3.4)
Each inequality bounds the region by a hyperplane.
3.2 Model Restrictions
Much of the speed of MOS switch-level simulation results from the use of transistor models
that have been simplified to allow their efficient analysis. Although we intend to generalize
those models, we retain certain constraints on the models in order to facilitate analysis.
Because moments matching can only be used to estimate the responses of linear circuits,
the first constraint is that nonlinear capacitors must be approximated by linear (i.e. fixed
value) capacitors. Although it may be possible to approximate nonlinear capacitors with
piecewise linear capacitors, this was not explored.
Second, we restrict the DC coupling from the gate (base) to the source and dram
(emitter and collector) to be unidirectional. This facilitates decomposition because clusters
CHAPTER 3. PIECEWISE LINEAR MODELS
25
can be analyzed independently once they have been properly ordered. The unidirectional
assumption is reasonable for MOS transistors because the DC current into the gate is
negligible. It is also acceptable for nonsaturating bipolar circuits such as ECL. Because
a bipolar transistor’s current gain, p - Ic/Ie, is typically on the order of 100, the DC
base current is typically two orders of magnitude smaller than the tree current and hence
contributes minimally to the switching delay of the preceding gate.’ There are digital bipolar
circuits such as IIL and TIT, which saturate the transistor and hence draw significant base
currents. This restriction is not likely to be acceptable for those circuits.2
Lastly, we focus our attention on piecewise linear models with small numbers of regions.
One of the advantages of the switched resistor model is that as long as a cluster’s inputs
don’t change, the cluster’s response can be computed once for all time. However, when
transistor models acquire greater numbers of regions, transistors may pass through multiple
regions during the course of a single logic transition. The response of a cluster must be
recomputed whenever any of its transistors changes its region of operation. In the limit, as
piecewise linear models become more detailed, the intervals between recomputation shrink
until they become comparable to the time steps taken by simulators employing numerical
integration. This would nullify the principle advantage of moment based techniques, that
is the ability to take large time steps. Fortunately, fairly simple piecewise linear models
often suffice provided that the operating point about which the device is linearized is chosen
judiciously.
3.3 MOS Level-O Model
Our simplest MOS Level-0 model is the switched resistor model described above. The
advantages of this model are that it can be analyzed extremely efficiently (the simulation
can be very fast) and that it provides good first order estimates of the switching delay
of most digital MOS circuits. Figure 3.3 compares the responses of inverters using the
‘Base current does affect noise margins in ECL circuits. However, DC noise margins are more efficiently
checked through the use of programs that perform a static analysis of the circuit. A dynamic logical simulation
is generally unnecessary and much more expensive.
21n principle where base current is sufficiently important it can be modeled using a piecewise constant
current source. However this was not explored.
CHAPTER 3. PIECEWZSE LINEAR MODELS
-0.2
6.4
0.6
26
-
0.8
1.0
nS time
Figure 3.3: Response of Level-O Inverter Compared to SPICE.
switched resistor model and SPICE’s nonlinear MOS model. For this simulation nonlinear
and/or floating capacitors have been approximated by linear grounded capacitors. Also the
model’s resistance has been selected assuming that a gate has approximately equal input and
output slopes. As expected (see Chapter 2), although the waveform shapes aren’t identical,
the switched resistor model does provides a good estimate of the delay to the 50% point.
The model has some limitations. Perhaps the most problematic is that it adequately
models the behavior of only certain kinds of circuits. Horowitz showed that single time
constant estimates can be produced for networks of nonlinear resistors as long as all
the resistors possesses identical pseudo-linear I-V characteristics[Hor83]. However, this
restriction excludes circuits with MOS transistors with different gate voltages, circuits with
linear resistors in addition to MOS transistors, and even circuits with both NMOS and
PMOS transistors in which transistors of both types are simultaneously on.3 While this
assumption is rarely a problem with MOS digital gates, there are circuits, such as the
sense amplifiers of dynamic and static MOS RAMS, for which the switched resistor model
produces poor predictions of the DC operating points and transient behavior.
Another problem is that while the switched resistor model can be used to predict the
3Apparently this excludes many CMOS gates. However, in practice the delay of a CMOS gate is usually
dominated by a tree of transistors of a single type driving the output node to Vcc or ground. A tree of the
opposite transistor type may also be attached to the output node. However because it usually is small, it
contributes little error to the switching delay even if it is ‘incorrectly” modeled.
CHAPTER 3. PIECEWISE LINEAR MODELS
27
step response of MOS networks, because it theoretically switches between its two states
instantaneously, it does not directly account for the affect of finite input slopes upon the
delay. Instead, an additional “nonlinear gate” model must be used to account for the affects
of finite input slope[Hor83]. However, for our piecewise linear simulator this input slope
dependency is most conveniently handled by formally incorporating characteristics of the
“nonlinear gate” model directly into a more sophisticated transistor model.
3.4 MOS Level-l Model
3.4.1
Rationale
The MOS Level-l model was originally motivated by the observation that velocity suturution, which has become prevalent in modem MOS transistors, tends to linearize the
behavior of the device. Velocity saturation occurs because lattice scattering limits the
maximum velocity of carriers drifting through a semiconductor. It appears in modem short
channel devices because as channel lengths decrease, the electric fields in the channel increase thereby increasing the velocity of carriers[GD85, pages 105-1071. From the circuit
designer’s point of view, velocity saturation is undesirable because it reduces the current
driving capabilities of the device. Ironically, velocity saturation simplifies the modeling of
MOS transistors using piecewise linear functions.
One effect of velocity saturation is that it tends to induce saturation at drain-source
voltages lower than those predicted by the quadratic model. To illustrate, Figure 3.4 plots
the drain cwent, Id vs the drabsource voltage, vds, for two transistors for a single gatesource voltage, V,, = 5. The lower curve is for a MOSIS 2~ transistor while the upper
curve is for the same transistor with the effects of velocity saturation eliminated.
From the plot it is apparent that the I-V characteristic of the device which isn’t velocity
saturated is largely quadratic. As such it consists of two segments. For low drain-source
voltages the transistor operates in the lineur region and the characteristic is parabolic.
Above a certain drain-source voltage4 (I& rv 4V) the transistor operates in the saturation
4For any given gate-source voltage, Vgs, the quadratic model enters the saturation region when I/& >
vg5 - vt
CHAPTER 3. PlECEWISE LlNEAR MODELS
2.
_
-
_
_,
28
,_
-
_,
_
Id
quadratic _ _,
_---I---
mA
x
1.
_
-
,-
/
I
-,A’ -
.-
,-
-
-I
-
-
I
’ velocity’saturatcd
’
I
Figure 3.4: MOS I-V Characteristics With and Without Velocity Saturation
region and the characteristic is a (nearly) horizontal line.
From the plot it is also evident that the characteristic of the velocity saturated transistor
is identical to that of the quadratic transistor except that it saturates at a much lower voltage
O&a - 1.51/‘). This increases the size of the saturation region such that it comprises most
of the I-V characteristic. Because this segment is nearly horizontal, the transistor in the
saturation region can be approximated by a current source. Simultaneously, the linear
region has been truncated so that what remains of the original parabolic segment can be
approximated by a straight line. This is equivalent to modeling the transistor in the linear
region by a resistor.
Another effect of velocity saturation is that it tends to linearize the dependence of the
channel current upon the gate-source voltage in the saturation region. For a quadratic
transistor, the transconductance in the saturation region increases linearly with the gate
voltage:
(3.5)
while for a velocity saturated transistor the transconductance asymptotically approaches
gm = (2Jwnzc
(3.6)
where V, is the maximum velocity of carriers drifting through the semiconductor. This
CHAPTER 3. PIECEWISE LINEAR MODELS
29
effect can be observed in Figure 3.5 which plots Id vs V,, for v& = 5. While the
1.0,
-
Id
0.8 - mA
0.6 - 0.4
0
-
.
’ ’ 4
2
Figure 3.5: Linearization of Transconductance for Large Vgs
current through the quadratic transistor (the upper curve) continues to increase in slope
with increasing V,, (the curve is a parabola), the slope of the velocity saturated transistor
(the lower curve) becomes constant for large Vgs. Thus the velocity saturated part of the
curve can be approximated by a straight line. This is equivalent to modeling the transistor
in the saturation region by a linear voltage controlled current source.
3.4.2
Model
The resulting MOS Level-Y model is depicted in Figures 3.6 3.8. In the saturation region
(Figure 3.6) the transistor is modeled by a voltage controlled current source shunted by a
“d
i= rt,(Vss-y 1
“g1
go
+
“s
&s-(&fibs) > 0
glvds - gm(Vgs - vt) > 0
(3.7)
(3.8)
Figure 3.6: Piecewise Linear MOS Model: Saturated Region
resistor. The transconductance is set by the parameter gm, while the gate-source voltage
CHAPTER 3. PIECEWISE LIiVEAR MODELS
30
for which the current source delivers zero current is V,. In addition, the upward slope of
the saturated segment of the I-V curve’ is modeled by g,,.
The inequalities in the figure define the saturation region polytope. Equation (3.7)
ensures that transistor has sufficient gate-source voltage to turn it on. Here, the quantity
(V, - (gO/gm)&) is the effective threshold of the device.6 Equation (3.8) ensures that the
drain-source voltage is sufficient to saturate the transistor.
In the linear region the transistor is modeled by a single conductance, gl (Figure 3.7).
Equation (3.9) ensures that current flows from the drain to the source. If that is not the case
“d
“g-
91
i
vds
>
0
glvds -sm(Vgs - 6) < 0
(3.9)
(3.10)
“s
Figure 3.7: Piecewise Linear MOS Model: Linear Region.
the source and drain are simply interchanged.
Lastly, the oflregion is the obvious open circuit in Figure 3.8.
“d
I
“g-
vds
>
0
&3-(Qp < 0
(3.11)
(3.12)
Js
Figure 3.8: Piecewise Linear MOS Model: Off Region.
The regions are plotted in Appendix B. Note that although there are a total of three
hyperplanes, any given region is bounded by only two. This is important because the effort
required to compute if and when a piecewise linear device changes regions is proportional
to the number of hyperplanes bounding the current region.
%e variation of drain current with the drain-source voltage in the saturation region is caused by channel
lengthmodulation[MK77l
6The dependence of the model’s threshold upon the drain voltage is explained in Appendix A.
CHAPTER 3. PIECEWZSE LINEAR MODELS
31
3.4.3 Choosing Parameters
In exchange for its simplicity, the Level-l piecewise linear MOS model loses the ability
to duplicate the behavior of a SPICE model over any arbitrary operating range. However,
digital circuits often operate their transistors in particular regions. If the parameters of a
piecewise linear transistor are chosen based upon the circuit in which it is used, good results
can be obtained.
CMOS gates
For static CMOS gates, the parameters should be chosen to match the I-V characteristics of
the SPICE model in a region of greatest current, that is, 2.5 < V,, < 5 and 2.5 < I& < 5.
The rationale is that typically most of the change in voltage at the output of a CMOS gate
occurs with the output transistors biased into their high current range. Because the rate of
change of voltage is proportional to the current, modeling errors in regions of low current
usually produce smaller timing errors than errors in regions of high current. Figures 3.9 and
3.10 illustrate the ability of the piecewise linear models to match the I-V characteristics
(a) NMOS
(b) PMOS
Figure 3.9: Piecewise Linear vs SPICE I-V Characteristics: V,, = 3,4, and 5 volts, SPICE
Level-2 models for MOSIS 2~ Process.
of MOS transistors from MOSIS 2~ and 1.2~ processes, respectively.
32
cHAP7ER 3. PIECEWISE LINEAR MODELS
2
Id
- - - - ., . _ _ _ _
I i-1 I:*’
I 0.4
b mP
-
2
vg!s=sv
t 0.3
, - - -, _ _ (
(
“Iv
7 0.2
s
vgs=3v
(
I 0.1
3
4
5 0.0
1
2
3
4
“iIs
(a) NMOS
5
“hs
(b) PMOS
Figure 3.10: Piecewise Linear vs SPICE I-V Characteristics: SPICE Level-3 models for
MOSIS 1.2~ Process.
In these figures, the I-V curves of the piecewise linear models are superimposed over
those of the SPICE models that they were tailored to match. The figures reveal a very good
match in the saturation region for large values of V,, and vds.
The quality of the match is also born out by a comparison of the transient responses of
CMOS inverters (Figure 3.11) using SPICE vs piecewise linear transistors.7 Figure 3.12
T
-tiF
in
out
Figure 3.11: CMOS Inverter
shows the responses for fast and slow rising exponential inputs. However, the match of I-V
characteristics is not as good for lower values of V,,, . Figure 3.13 plots Id vs VI5 and shows
71n this section only the modeling of transistor I-V characteristics is considered. The modeling of nonlinear
and/or floating capacitors using linear grounded capacitors is consider& in a following section. Thus, in this
section when circuits using SPICE vs piecewise linear transistors are compared, errors due to the modeling
of capacitors are eliminated by swamping all device capacitances with large linear grounded capacitors.
33
CHAPTER 3. PIECEWISE LINEAR MODELS
5
_
-i\
_ \
V
4
-
-,
-
-
-
-
-
-
*I
__I
.----I
___
3
-
-
-
-
-
I
-
-j_]:
-I
_
;.
in
.
out
-,
_
_
-
.
-
-
-
a
\
1.
-
-
-
-
-’
/
v
OO
Figure 3.12: Inverters Using Piecewise Linear vs SPICE Transistors.
mA
Figure 3.13: Mismatch for Small V,,.
20
nS time
34
CHAPTER 3. PIECEWISE LINEAR MODELS
that the piecewise linear model (lower straight line) underestimates the current for small V,,
where the SPICE model (upper curve) is no longer velocity saturated. This modeling error
is most evident for slow inputs. When the input to a logic gate switches slowly relative
to the output, the transistors are never biased into their high current regions (Figure 3.14).
However, as pointed out by Horowitz(Hor831, timing errors in this case are less important
\ --
40
nS time
Figure 3.14: Slow Input to CMOS Gate Accentuates Modeling Errors.
because they are small relative to the delay (and hence, error) of the previous stage.
It is interesting to note that the Level-l piecewise linear model predicts the behavior of
stacks of transistors more accurately than the switched resistor model. As mentioned above,
it is necessary to assume quadratic behavior in order to model MOS transistors as pseudolinear resistors. Unfortunately, velocity saturated transistors are not pseudo-linear. To see
why, consider the series connection of a pair of identical MOS transistors. If the devices
were pseudo-linear, then it would be possible to perform series combination as if they were
resistors. That is the pair could be replaced by a single transistor which supplies half the
current (for example, one with half the channel width). However, the series connection of
two identical transistors is actually equivalent to a single transistor with twice the channel
length. Because the channel length has been doubled, velocity saturation (a short channel
effect) becomes less pronounced, and the pair will deliver more than half the current of
a single transistor. This is confirmed by a comparison of the I-V characteristics and step
response of series stacks of pairs of MOS transistors using SPICE, piecewise linear, and
35
CHAPTER 3. PIECEWISE LINEAR MODELS
pseudo-linear models (Figure 3.15). Both plots show that the piecewise linear model gives
0.61
-
-
Id
0.5
-
-
4
_
_
_
_
_
_
_
_
1
____-----
0.4'
-
-
0.3
-
-
‘\
_
‘\- - -I- - - - - - *
‘1,
_ _ ,-\_I’ - - - - - I
_ \
\
_‘_
1
_
_’
2
- SPICE
1\
“\
\
\ ‘\
1
_-- Piecewise Linear
-_. PseudO~ Linear
5 00
3
4
!
“ds
(a) I-V Curves
20
nS time
(b) Step Response
Figure 3.15: NMOS Stacks Using SPICE, Piecewise Linear, and Pseudo-Linear models.
a better fit to the SPICE model than the pseudo-linear model.
Other Circuit Forms
b i t out
P?
Of course, different circuit forms bias their transistors into different regions of operation.
A piecewise linear model formulated for static CMOS gates does not necessarily perform
well for other kinds of circuits. To illustrate, consider the differential amplifier circuit in
Figure 3.16 which has been used to sense the small voltage swings on the bit lines of
Ei
a
tail
VCS
I
I---
-I
=
Figure 3.16: SRAM Sense Amplifier
C’HAP’l-ER 3. PIECEWZSE LINEAR MODELS
36
CMOS static RAMs[MMS+80]. When the Level-l models formulated for static CMOS
logic are used in the sense amplifier (Figure 3.17 (a)) (The response of ouc and u are shown
1
,
_
/
.
,
-
I
;_
OoI
I
10
(a) Poor Choice of Level- 1 Parameters.
-
20
30
40
50
60
-
I
1
n!? 7
timi
(b) Better Choice of Level- 1 Parameters.
Figure 3.17: Simulated Response of SRAM Sense Amplifier.
for rising and falling transitions on hr.) the match with SPICE is not as good as with MOS
static gates. However this result is a vast improvement over Rsim. Mom’s output generally
does the right thing although there are errors in the timing. In contrast, Rsim reports that
all nodes remain in the undefined state.
It is interesting to investigate the cause of the mismatch particularly in light of the
excellent matches obtained for static gates. The reason is that the sense amplifier is
very different from a static gate. Not only is the output swing not from 0 to 5 volts
(it is from 2 to 5 volts) but the input swing is only 1 volt peak to peak centered about
3.5 volts. Thus, the sense amplifier operates its transistors in different regions than do
MOS static gates. If the user takes care to linearize the transistors in the actual operating
regions of the circuit better results can be obtained. In this case excellent results are
obtained when the transistors are linearized with the circuit biased at its switching threshold
(l&t = V= = 3SV)(Figure 3.17 (b)). This match is surprisingly good considering that the
transistors operate in regions where they are not strongly velocity saturated and, in fact,
exhibit substantial quadratic behavior.
37
C’P’ZER 3. PIECEWISE LINEAR MODELS
To conclude, the MOS Level- 1 model consistently yields better results than the switchedresistor model. Furthermore, if care is taken in choosing the point about which the &vices
are linearized, the matches with SPICE can be surprisingly good considering the simplicity
of the model.
3.5 Bipolar Model
Bipolar transistors are more difficult to model because their exponential characteristics are
more strongly nonlinear than MOS transistors’ quadratic characteristics. For the purpose
of ECL switch-level simulation satisfactory results have been obtained using the simple
For this model, g,,, represents the
Level-0 model shown in Figure 3.18[KAHS88].
“c
vb$
+)
“b-
(+c
forward
active:
“c
I
=
%(&I-
“on)
“b-
&, - Van > 0
Figure 3.18: Piecewise Linear Bipolar Model.
transconductance of the transistor, and V,, the nominal base-emitter voltage drop in the
forward active region.
The model incorporates several simplifications. In the forward active region the output
conductance caused by base width modulation[MK77] was omitted. The output conductance does not appear to significantly affect the delay, and its omission allows current
steering trees to be analyzed in linear time (See Section 5.5). Furthermore, the saturution region was omitted altogether because the additional complexity would degrade the
performance of the simulation and ECL gates don’t normally operate transistors in that
region.
The parameters can be obtained by linearizing an ECL inverter (Figure 3.19) about its
switching threshold: V&, = V,,j. At the threshold each of the transistors receives half of
the tail current. Since the current for an ideal bipolar transistor in the forward active region
CHAPTER 3. PIECEWISE LINEAR MODELS
38
Figure 3.19: ECL Inverter
can be approximated by[MK77]:
Vbe
I, = I,em,
(3.13)
the parameters are given by
gm =
Itd
2KT/q
KT ln Itail
von = 21,
9
(3.14)
(3.15)
In addition, if the parasitic resistance in series with the emitter terminal is not modeled
separately its affect can be incorporated into g,:
s:, =
1
& + re
(3.16)
Figure 3.20 depicts an ECL AND gate and compares the response of the circuit using
piecewise linear and SPICE models. Again the match is quite good especially considering
the simplicity of the model.
3.6 Second Order Phenomena
The previous section only considered the modeling of I-V characteristics of devices. For the
sake of clarity examination of various second order effects were postponed. This section
will address the modeling of nonlinear capacitance, floating capacitance, and parasitic
resistance. MOS switch-level simulators model the effect of floating capacitors using
“equivalent” grounded capacitors. This approximation has been effective for most MOS
CHAPTER 3. PIECEWISE LINEAR MODELS
39
Figure 3.20: ECL AND Gate.
gates because the majority of the capacitance is to ground. However, for circuits with a
larger fraction of floating capacitance this approximation can result in significant errors.
To illustrate, we will consider the responses of CMOS and ECL inverters with and without
floating capacitors (Figure 3.21). To simulate more realistic loading, the output of the
(a) CMOS immrter
(b) ECL inveder
Figure 3.21: CMOS and ECL Test Circuits
CMOS inverter and the inverting output of the ECL inverter drive other inverters. The
inputs are rising exponentials.
In Figure 3.22 the inverters using SPICE models include the devices’ nonlinear floating capacitors, while the inverters using piecewise linear models use “equivalent” linear
CHAP7ER 3. PZECEWISE LINEAR MODELS
40
6
V
5
4
3
2
1
I
,O
C 1.
6.1-
012.
b’35-L.L
.
0.4.
0.3
nS ti!i2
(a) CMOS
ii
LO I
/
0.2
6:4
- 0.6
6.8
- 10
nS timi:
(b) ECL
Figure 3.22: Grounded Capacitor Approximations.
grounded capacitors. From the plots it can be seen that the grounded capacitor model
performs much better for the CMOS inverter than for the ECL inverter. For the CMOS
inverter, the most apparent error comes from the floating capacitance between the gate and
drain of the MOS transistors. This causes the response of the SPICE inverter to bump up
slightly before falling to ground. However, the responses rapidly converge after the bump.
The error in switching delay is only 4%.
In contrast, numerous sources of mismatch arise when one attempts to approximate the
floating capacitances of ECL gates. Several of these can be observed in the plots. The first
has already been observed for the CMOS inverter. The floating base-collector capacitance
of TI causes the inverting output of the SPICE circuit to bump up slightly before falling.
Second, note that the non-inverting output of the SPICE circuit begins to rise when the
input begins to rise rather than when the input crosses the switching threshold. When
the input rises, current is injected into rail via the floating base-emitter capacitance of TZ.
This current cancels some of the current extracted from rail by the current source thus
causing the non-inverting output to rise. Third, after the initial bump the inverting outputs
appear to converge up until t=5OOps when they again diverge. This occurs because the
“equivalent” capacitance seen looking into the input of the second stage changes with time.
Initially, this load consists only of the base-collector capacitance of T3. However, when
CHAPTER 3. PIECEWISE LINEAR MODELS
41
the second stage switches the effective capacitance increases suddenly. When the collector
of T3 falls, its effective floating base-collector capacitance increases because of the Miller
effect. When the base-emitter voltage of T3 collapses the base-emitter floating capacitance
becomes visible. As T3 turns off, its stored base charge must be discharged by the first
stage. ‘lhe result is that the switching delay error for the inverting output is 8.3% while the
error for the noninverting output is 24%. In general, we have observed that the modeling
of floating capacitors using grounded capacitors in ECL leads to errors of up to 30%.
If greater accuracy is required it is necessary to include linearized floating capacitors
in the piecewise linear model. Figure 3.23 shows the Level-Z bipolar model and its
0.1
V
0.0
5
Cje+Cqb
Figure 3.23: Level-l Bipolar Model has Floating Capacitors.
response compared with the SPICE model. (Appendix C describes the generation of
linear approximations of the nonlinear capacitances of the bipolar transistor.) It is apparent
that the waveform accuracy has increased substantially. The switching delay error of the
inverting output has dropped from 8.3% to 4% and that of the non-inverting output from
24% to 19%. These results confirm that much of the error was due to the inadequacies of
the grounded capacitance approximations.
Finally, if still greater accuracy is required the parasitic resistances of the bipolar
transistor can be included in the piecewise linear model. Figure 3.24 shows the Level-2
bipolar model and its response compared with the SPICE model. The matching is very
good indeed. The switching delay error for the inverting output has dropped from 4% to
CHAmR 3. PIECEWZSE LINEAR MODELS
-7
42
Re
0.0
-0.i
64
-
-0%
i):8- 1 0
nS time
Figure 3.24: Level-2 Bipolar Model has Floating Capacitors and Parasitic Resistors
1.5%, while that for the non-inve.rting output has dropped from 19% to 4%. The remaining
mismatch between the SPICE and piecewise linear models appears to be due mostly to
the modeling of non-linear capacitors using linear capacitors. For example, the previously
described bump on the inverting output caused by the base-collector floating capacitance of
TI is consistently smaller for the SPICE models than for the piecewise linear model. The
reason is that the non-linear base-collector capacitance varies by more than a factor of two
from .77Cjd when the base is low and the collector is high, to 1.65C,d when the base is
high and the collector is low. Thus when we are forced to choose an average capacitance,
that capacitance will be too large at the beginning of the transition, and too small at the end.
The initial over-estimate leads to an over-estimate of the size of the bump.
Before we conclude this section, it should be admitted that these experiments are,
perhaps, overly conservative. First, note that the error in the switching delay for the noninverting output appears to be uniformly larger than the switching &lay for the inverting
output. The reason is that the non-inverting output is unloaded and hence has fewer
capacitors attached. When larger numbers of capacitors are modeled, individual modeling
errors tend to cancel. When this output is loaded the switching delay error decreases. This
tendency for unloaded outputs to have larger modeling errors has also been observed for
MOS switch-level simulators and timing verifiers. However if the output is truly unloaded
then nothing is affected by it and we probably don’t care about its error.
CHAPTER 3. PIECEWISE LINEAR MODELS
43
Second, only device parasitic capacitances have been included. Wire capacitance has
been neglected because it is very layout dependent. However, because wire capacitance
is usually modeled as linear and grounded, it tends to swamp out the errors caused by the
modeling of floating and nonlinear capacitors. Thus, actual circuits, which include wire
capacitance, can be modeled more accurately than might be indicated by these experiments
and will be more tolerant of simpler models.
3.7 Summary
A piecewise linear device is represented by collection of linear circuits, each of which
represents the device for a particular region of operation. Each region of operation is a
polyrope described by a conjunction of linear inequalities. Each inequality represents a
hyperpluneboundary.
Several restrictions are placed upon the piecewise linear models in order to preserve
the efficiency of timing analysis. First, the DC coupling from the gate to the source and
drain for a MOS transistor, and from the base to the emitter and collector for a bipolar
transistor is modeled as unidirectional. This facilitates circuit partitioning. Secondly, only
linear capacitors are allowed in the models. Nonlinear capacitors must be modeled using
equivalent linear capacitors. Finally, the number of piecewise linear regions is limited. This
is necessary to preserve the principle advantage of moment-based techniques: the ability
to take large time steps. Simple models can be constructed by utilizing a priori knowledge
about the operating regions of transistors in particular forms of digital circuits. Experiments
with a number of MOS and bipolar models indicates that these restrictions are acceptable.
In a number of instances predictions of switching delays to within 4% were possible.
Two MOS models were explored. The simplest model, the MOS Level-0 model, is the
switched resistor model. This model was included to allow maximally efficient simulation.
Experience with Rsim (a MOS switch level simulator) has shown that the model is useful
for predicting the delays of most forms of MOS static logic. For circuits where greater
accuracy and flexibility are needed, a MOS Level-Z model was explored. This model differs
from the Level-O model in that it models the linear and saturation regions separately. The
Level-l model was justified on the grounds that velocity sutururion tends to linearize the
CHAPTER 3. PIECEWISE LINEAR MODELS
44
behavior of modem MOS transistors. Because of its increased sophistication the Level-l
model supplies greater waveform accuracy when simulating static CMOS circuits. The
switching error of an inverter employing Level-l models can be as low as 4% relative to
SPICE. Additionally, the Level- 1 model can be used to correctly simulate circuits for which
the switched resistor model fails to yield even a correct logical simulation.
Three bipolar models were also explored. The bipolar model includes only two regions
of linearity, on and off. When it is on, the exponential I-V characteristic of the bipolar
transistor is approximated by a voltage controlled current source. However, because of
second order affects, it appears to be more difficult to model ECL than CMOS. A large part of
the problem is the greater proportion of floating to groundedcapacitance in an ECL gate. For
an ECL inverter it was observed that the approximation of floating capacitors by grounded
capacitors produced errors as large as 30% in the switching delay. These errors were
reduced to 20% when floating capacitors were included in the bipolar model. Additionally,
parasitic resistances in the bipolar transistor appear to make significant contributions to
the switching delay. When they were also incorporated into the transistor model, these
errors were reduced to 4%. Note that all the errors reported are conservative because a
large source of linear, grounded capacitance, wire capacitance, was neglected. When wire
capacitance is included we expect the errors for all models to decrease.
Overall, the use of piecewise linear models is promising. Although restrictions have
been placed on the models in,order to preserve efficiency surprisingly simple models appear
to significantly extend the accuracy and flexibility of the simulator. The next two chapters
will examine the procedures for computing the response of networks made up of these
devices.
Chapter 4
Waveform Approximation
The previous chapter showed that simple piecewise linear transistor models can produce
good predictions of the behavior of digital circuits. However, once a circuit uses piecewise
linear models it may no longer reduce to an RC tree and its response may not be well
approximated by an exponential. In fact our experience has been that as transistor models
increase in complexity, so do the responses of the circuits containing them. Therefore a
more flexible technique for approximating waveforms is required than Rsim’s single time
constant delay estimation.
Fortunately, Rsim’s single time constant techniques have been extended to allow more
accurate waveform estimates with multiple time constants. The generalized moments
matching procedure mentioned in Chapter 2 allows Mom to produce more sophisticated
estimates of the responses of circuits containing piecewise linear models. This technique
possesses several promising characteristics. In contrast to numerical integration algorithms,
a single computation yields afunction of time, u(t), representing the response for all future
time, t E [0, 001. In contrast to single time constant algorithms, it allows the computation
of waveform estimates of arbitrary accuracy.
This chapter begins with a brief review of the theory behind the moments matching
procedure. It then discusses some practical aspects that must be considered in an implementation. Next the utility of the procedure is demonstrated by showing a number of
simulations. The chapter concludes with a description of some of the limitations of the
procedure.
45
CHAPTER 4. WAVEFORM APPROXlMATION
46
4.1 Waveform Approximation
One general approach to waveform approximation begins with the derivation of a low
order model of the (presumably high order) system of interest. If the low order model
is sufficiently simple it becomes practical to compute its response exactly. That response
serves as an approximation of the response of the original system. The model or&r
reduction problem has been studied extensively by linear control theorists. Of the many
methods proposed, one of the simplest is based upon moments motching[BL72]. Although
more advanced techniques demonstrating superior convergence and stability properties
were subsequently derived[Cha91], none of these appears to be efficient enough for use in
our particular application.
The moments matching waveform approximation procedure involves two steps[PR90].
First the asymptotic final voltages of each node are computed by finding the DC solution of
the network (assuming all piecewise linear devices remain in their present regions). This
DC solution corresponds to the particular solution. Then all DC sources in the circuit
are set to zero and an estimar. for the homogeneous solution is generated by matching
moments. The total estimate is the sum of the two solutions.
4.2 Pad6 Approximation
It has been observed that moments matching is, in fact, just a particular application of the
Pad& approximation[Zak73]. In general, the Laplace transform of the impulse response of
a lumped linear time-invariant circuit takes the form of a ratio of polynomials in s:
II(s) = PO + PIS + hs2 + * ’ * + Ansm
00 + 01s + cY2s2 + . . . + a,.P *
(4.1)
where the order of the denominator polynomial, n, is usually equal to the number of
storage elements (capacitors and inductors) in the circuit. However, if n is very large
it can be prohibitively expensive to compute H(s) exactly. Instead, a lower order Pad6
approximation is constructed:
fits) = l)o + bl.5 + hS2 +. -. + bj-,S'-'
1 + ais + a2s2 + . . . + u,s’
(4.2)
47
CHAPTER 4. WAVEFORM APPROXIMATION
(j < n) and used to model the response of the system.
The parameters of the Pad6 approximation are obtained using a two step process. First
the 2j low order terms of the series expansion in s of H(S) are computed:
H(S) = Tl20 + 7721s + m2S2 + m3S3 + s e. + ?7Z2j-~S2js1 +
O(S2j)
(4.3)
Hem, the m; are the mOments of the impulse response’and 0( s2j) represents all other terms
of or&r 2j or higher.
Moments are of interest because they are easily computed directly from the circuit even
though it is usually impractical to compute H(s) in closed form. Pillage and Rohrer[PR!W]
point out that moment computation can be viewed as a sequence of DC solutions. First the
voltages and currents at t = 00 are found by computing the DC solution of the network
assuming that capacitors are open circuits and inductors are short circuits. The difference
between the initial and final voltages across capacitors and the initial and final currents
through inductors are the 0th order moments. Then the (k + 1)st moments are recursively
computed from the kth moments by finding the DC solution of the network derived by
(Figure 4.1):
1. Setting all independent sources to zero. This has the effect of subtracting out the
particular solution.
2. Replacing capacitors with current sources equal to the product of the capacitance
times the kth moment of the capacitor voltage.
3. Replacing inductors with voltage sources equal to the inductance times the kth moment of the inductor current.
Once the DC solution is found, the resulting capacitor voltages and inductor currents
represent the (k + 1)st moments. Note it is quite straight forward to find the DC solution
‘Strictly speaking, the Laplace coefficients, mi, are equal to the moments, iii, scaled by constant factors:
t-l)'
mi=--r J0
co tih(t)dt = (-l)';+
i!
However, in the literature the term moment is used to refer to both mi and Gi. We will continue this
convenient, although somewhat imprecise, use of the term momenr.
48
CHAPTER 4. WAVEFORM APPROXIMATION
Figure 4.1: Replacing Capacitors and Inductors to Compute (k+l)st Moments.
of a resistor tree. In addition Chapter 5 will demonstrate that those procedures can be
generalized to solve trees of piecewise linear transistors.
Once the low 2j moments have been computed from the circuit, A(s) is expanded via
long division into a power series in s and its coefficients are chosen such that the low
2j moments of the approximate response match those of the actual response. It has been
shown that this matching constrains the a; and bi to be linearly dependent on the mi. The a;
and b, can be obtained by solving the following linear system of equationslZak73, HuaW]:
ml-l
mj
mzj-2
1
m,-2 . . .
nl0
a1
mj-1
. . .
ml
a2
7Tl2j-1
. . .
mj-1
aj
=
-
a1
m0
m2
.
mj-i
mj-2
a2
.
.
.
mj-3 . . .
m0
(4.4)
m2j-1
m0
ml
ml
mj+i
=
aj-1
bo
h
.
1
(4.5)
bj-1 ~
The matrix on the left hand side of Equation 4.4 is referred to as the moment murrix. Once
A(s) is known, its denominator can be factored
fits) = 60 + b + b2 + . . . + bl-lsj-l
(“1 + 1)(sT2 + 1) * *a (STj + 1)
(4.6)
and the result expanded into partial fractions:
A(s) = Icl + Ic2 + . 1). . (,,‘i;I 1)
(ST1 + 1)
(33 +
(4.7)
CHAP’FER 4. WAVEFORM APPROXIMATION
49
But the inverse transform for each term in Equation (4.7) is given by:
l-1
1
t
1+s7 I
=
1
7
+-t/r
(4.8)
*
Thus the inverse transform of Equation (4.7) yields the time domain estimate
h(t) = $-t/71 +
$,-tir, +. .
.
+
+t/r,
2
(4.9)
3
In principle, approximations of arbitrary order can be computed from the moments. It
has been shown that when j = 1 the Pad6 approximation is equivalent to the single time
constant estimate generated by RC tree analysis[PR90]. Furthermore, as j approaches the
order of the actual system being approximated, the Pad6 approximation, H(s), converges
to the actual system function, H( s)[Hua90].
4.3 Practical Considerations
Although in principle moments matching encompasses approximations of arbitrary order, in
practice it may be impossible to produce an approximation if the order is too high or too low.
Furthermore even in those cases when an approximation is possible, it may contain spurious
unstable poles. Moments matching has been observed to be numerically sensitive[Hua90]
and care must be taken when applying the procedure.
4.3.1 Order Too Low
If the order of the approximation is too low the moments matching procedure can fail
to yield any approximation at all. Note that the first row of a jth order moment matrix
(Equation (4.4)) is [mj-1 . . . rn2 ml mo]. If the low j moments are all zero, then that row is
all zero, the moment matrix is singular, and it is impossible to produce an approximation.
This situation frequently arises as a side effect of floating capacitors. Consider the the circuit
in Figure 4.2. Note that except for nl all nodes are initially zero and that all final values
are zero. The lowest moment, mo, is given by the negative of the initial condition[PR90].
‘In practice, we solve for the ai, factor the polynomial to get the pole frequencies, and then compute the
pole coefficients, Ici , directly from the poles frequencies. Se-e PR!90].
CHAPTER 4. WAVEFORM APPROXIMATION
50
Figure 4.2: Circuit with Low Order Moments Equal to Zero.
Thus the zeroth order moments for nodes {nl, n2, n3, n4) are {-I, 0, 0, 0) , respectively.
To compute successive moments each capacitor is replaced by a current source equal to the
capacitance times the difference of the moments on its terminals:
m0:
{
-1,
0,
ml: {
l? - 1 ,
m2 : {
-2,
0, 0 }
0, 0 }
3, -17 0 1
Thus nodes that are initially at rest but are coupled via a chain of floating capacitors to nodes
that are not will have low order moments equal to zero. If the chain is k links long then
the k low order moments will be zero and approximations of order 5 k will be impossible.
This is undesirable because the lowest order approximation possible grows with the size of
the circuit.
However, as will be observed in Chapter 5 the coupling through multiple levels of
floating capacitors is not important for digital circuits. Therefore when too many low order
moments are zero we simply assume that this was caused by too many levels of floating
capacitors and set the particular response to zero.
4.3.2 Order Too High
Although approximations of arbitrarily high order are theoretically possible, in practice
numerical considerations limit the order of the approximation. The reason for this is that
round-off errors can make it difficult to compute higher order moments with sufficient
accuracy. Note that the series expansion of the Laplace transform of a single pole response
is:
L { e-‘1”) = T’ - TfS + +2.. .
(4.10)
CHAPTER 4. WAVEFORM APPROXIMATION
51
Therefore by superposition, the general multipole response:
v(t) = kle-‘/‘l + kze-‘/- + . . . + k,e-f’Tn
(4.11)
has a Laplace transform whose series expansion is:
Y(s) = @ITI + k272 + . . . + k,Tn) (k,+ + k& + . . . + knT,2)S +
(k,Tf + k& + . . . + k,T;)s’.
(4.12)
Now, suppose that the magnitude of ~1 is much larger than that of all other T’S. In that
case the higher order coefficients will be dominated by ~1. That is for sufficiently large i,
mi rv hq‘+l. However if this is the case, then the rows of the moment matrix become
linearly dependent, the moment matrix becomes singular, and it becomes impossible to
compute an approximation. In practice, as higher order approximations are attempted, the
moment matrix tends to become more and more ill-conditioned until a point is reached when
it can’t be inverted. At that point it becomes necessary to use a lower order approximation.
Huang proposed a technique to enhance the accuracy of moment computation. The
technique involves inserting a resistor in parallel with each capacitor and in series with
each inductor, where the resistances are sized proportionally to the capacitances and
inductances[Hua90]. This effectively shifts all poles and zeros left in the complex plane
away from the imaginary axis, thus reducing the tendency of the lowest frequency pole
to dominate the other poles. The technique was successfully used to obtain the frequency
domain behavior of operational amplifiers and active filters.
Unfortunately, the method does not appear to be suitable for our particular application.
One reason is that the resistors that parallel floating capacitors would destroy the tree
structure of our circuits. Then the computation of moments would require the formulation
and LU factorization of a matrix, a process that is generally superlinear. However, even if
a matrix were formulated and factored, it appears that in order to select the shift amount, a
priori knowledge about the placement of the poles must be utilized. It does not appear to
be possible to choose one shift amount that is good for all circuits. The alternative of trial
CHAPTER 4. WAVEFORM APPROXIMATION
52
and error selection of the shift amount appears to be prohibitively expensive because each
new trial modifies the circuit and requires the refactorization of the matrix.
Because of the difficulties associated with generating extremely high or&r waveform
estimates, Mom only generates estimates with three poles or less. Fortunately for our
purposes low order estimates are usually sufficient.
4.3.3 Unstable Poles
The moments matching procedure occasionally yields defective poles, that is poles that are
unrelated to the poles of the system function being approximated. In fact, researchers have
long observed that Pad6 approximations can contain unstable poles even if the system being
approximated is stable[Zak73, Cha91]. One source of defective poles is numerical error.
From Equation (4.12) we can infer that small errors in the moments may contribute spurious
poles to the waveform approximation that have small time constants and coefficients.
However, defective poles may also arise in the absence of numerical error. For example,
the stable second order impulse response:
h(t) = edt - $-li?
(4.13)
has the unstable first order approximation:
ii(t) = -+?‘.
(4.14)
Yet if the series expansions of the Laplace transforms of both signals are computed, it can
be verified that their two low order terms match exactly.
Defective poles usually pose no problem if they are stable. Because they tend to have
small time constants and coefficients they tend to have minimal effect on the waveform
estimate. Unfortunately, if the defective pole is unstable it domiMtes the waveform.
Consequently much work has been done exploring ways of dealing with unstable defective
poles. The most common approach is to restrict the problem domain to stable systems.
Then the approximation procedure is modified to allow only stable poles[Zak73, GP91].
However, the simulation of digital circuits occasionally requires modeling unstable
responses. For example, consider the Schmitt trigger depicted in Figure 4.3. If out is
CHAPTER 4. WAVEFORM APPROXIMATION
53
Figure 4.3: Schmitt Trigger
initially low, and in falls from high to low, T2 will eventually switch from oRto on. When
it does, an unstable positive feedback loop is established. That is T2 turning on tends to
turn off TI which causes out to rise which turns on 7’2 even harder, etc. Because this loop
is unstable its response includes an unstable pole. Of course, the unstable pole can persist
only briefly. The node our very quickly rises to the point where it turns off TZ completely
thus returning the circuit to a stable configuration.
In order to handle the possibility of unstable circuits we employ a heuristic to first
distinguish between stable and unstable clusters. When it is known, a priori, that a cluster is
stable, all unstable poles can be safely suppressed when estimating its response. Conversely,
the response of an unstable cluster is performed in such a way as to require at least one
unstable pole.
A fairly simple heuristic can be used to identify commonly occurring unstable digital
circuits. They are typically characterized by a feedback loop with an open loop DC gain
that is greater than one. Conveniently, this information is generated as a side effect of
finding the particular (DC) solution. For our example, because the Schmitt trigger contains
feedback, circuit tearing must be employed to solve it. This involves:
1. tearing out the wire connecting the base of 72 to out in order to break the feedback
loop
2. computing a value that is equivalent to the open loop DC gain
Thus the routine that computes the particular solution can easily identify clusters with
unstable positive feedback.
CHAPTER 4. WAVEFORM APPROXIMATION
54
Our experience with this heuristic indicates that it quite reliably identifies commonly
occurring unstable digital circuits. Errors that occur when analyzing more sophisticated
circuits (for example Mom will miss the unstable poles of an unstable voltage reference
generator) have not been problematic because such circuits are small and can easily be
isolated from the predominantly digital design.
4.3.4 Efficiency
Because the generation of a jth order waveform approximation involves the inversion of
multiple non-sparse j x j matrices and the factorization of a jth order polynomial, we
would expect the cost of waveform approximation to rise superlinearly with the number of
poles. This is confirmed by measurements of the execution time of the waveform generation
portions of Mom. Table 4.1 gives the average number of cycles3 needed to generate one,
Table 4.1: Execution Time Required to Generate Waveform Approximations.
two, and three pole waveform estimates for a particular circuit. Note that the cost varies by
a factor of 20. As we shall see in a later chapter, this growth can significantly degrade the
simulator’s efficiency when more complex circuits and models are used.
4.3.5 Error Control
For the sake of computational efficiency it is desirable to employ the lowest order approximation possible. The usual approach is to start with the first order approximation (j = 1)
and to compute successive higher order approximations only when necessary. An estimate
of the approximation error is produced for each waveform estimate. If that error is above a
certain threshold then the next higher order approximation is attempted.
Our estimate for the approximation error is a derivative of one suggested by Shi and
Zhang[SZ87]. After generating a jth order approximation from the low 2j moments
‘Estimates of machine cycles are for the MIPS R2000 CPU and were estimated using the pixie execution
profiling tool created by MIPS Computer Systems, Incorporated.
CHAPTER 4. WAVEFORM APPROXIMATION
55
(mo... mzj-I)), we then see how well that approximation matches the 2jth moment.
Comparing the match of higher order moments is computationally efficient and additionally
provides a sense of the conditioning of the system. If the 2jth moments and the (2j +
1)st moments match exactly then there is no use even attempting the next higher order
approximation because that system of equations will be ill defined.4
In general the threshold for an acceptable approximation error is set to a value between
2% - 20% depending upon the desired level of accuracy. However, it is important to reduce
the threshold for large signals because a 10% error in a 1000 volt signal is much worse
than a 10% error in a 1 volt signal. At first glance it would seem impossible for such
large signals to occur in common forms of digital circuits. However when piecewise linear
transistor models are used asymptotic swings much greater than the logic swings occur with
surprising frequency. For example, consider the two input CMOS NAND gate in Figure 4.4
when piecewise linear MOS Level- 1 models are used. Just after the input has risen, TI and
Figure 4.4: 2 Input CMOS NAND Gate.
T2 are off and T3 and T4 are saturated. The asymptotic final value of the output waveform
is computed by finding the DC solution of the network assuming that all devices remain
in their present regions of linearity. For our model, gm x 1/3.6k and K x 2. Since T&s
gate is at 5 volts, its internal current source generates x -(5 - 2)/3.6k z -830pA of
current. However T3's drain isn’t connected to anything. Therefore all of T4's current
4To see this, note that if a j pole approximation, Vj (t ) matches the low 2 j + 2 moments exactly, then so
must the j + 1 poleapproximation:
Vj+l(l) Z Vj(t) + kj+lfZc'T'+'
for kj +I = 0. However, for higher order approximation there is no unique value for rj +I. That is, there are
more variables than constraints and the system is ill defined.
CHAPTER 4. WAVEFORM APPROXIMATION
56
must flow into T4’s output resistance x 67k. Thus the DC solution forx is about -830pA
x67k x -56 volts. Furthermore, with no net current flowing through T3, the voltage gain
from its source to its drain is approximately 67k/3.6k x 19. Thus the DC solution for
out is about -56 - (56 + 5) x 19 x - 1100 volts! Figure 4.5 compares the response of
0
V
KV
0
-1
-2
\
-3
_
_
-
_
_
-
_\ \-
\
-4
0.1
us
(a) Full Swing.
0.2
time
I
-Il.0
_
0.1
_ . _ _. - _
0.2
0.3
0.4
nS
-1.
0.5
time
(b) Magnified Swing.
Figure 4.5: Large Signal Swing for CMOS NAND Gate.
OUI predicted by our simulator using 10% and .05% error thresholds. On a scale of 2000
volts (a) the signals are indistinguishable. In fact, the approximation error is just 0.273%.
However when the waveform is examined on the scale of 5 volts (the region we actually
care about) we see that significant error is introduced. The solution is to scale the error
threshold of large signals inversely proportional to their size. The lowest order moment,
m0, is used to estimate the size of the signal.
4.3.6 Frequency Scaling
Another source of errors in the moment matrix arises essentially from a poor choice of
units of time. Consider the moments of a multipole response (Equation 4.12). If the T’S
are all of the order of lps then mu w 10-12, ml - 10-24, m2 - 10-36,. . . . That is, the
magnitude of each moment will be approximately 12 orders of magnitude smaller than its
predecessor. This large variation in moments introduces large variations in the magnitude
of the elements of the moment matrix, thereby rendering it uninvertible.
CHAPTER 4. WAVEFORM APPROXIMATION
57
However, note that the choice of units of time is arbitrary. If, for example, we had
chosen lps to be the unit of time, then the r’s would all be of the order of 1 and the large
variation between successive moments would be eliminated. Pillage[PR90] suggested
scaling frequencies by the ratio Imt /ma I. However, consider the following set of moments
obtained from an actual simulation.
{ tng,mb. m5) = { o,o, 1.4 x 10-35, -3.0 x 10-30,5.5 x 10-39, -5.9 x lo-48)(4.15)
These moments were particularly problematic because the 18 orders of magnitude difference in the sizes of the moments made it impossible for Mom to compute any waveform
approximation. Note that because mo = 0 Pillage’s method can’t be applied directly.
An obvious modification would be to use the ratio Irnk+l/rnkl where k is smallest
subscript for which rnk # 0. However for our example this makes things even worse.
{ mo,... ms} = {0,0.3.3 x lo+, -3.3 x lo-&, 2.9 x 1O-6o, -1.5 x 10-74} (4.16)
Whereas the ratio of the magnitude of the smallest moment to that of the largest used to be
18 orders of magnitude, it is now 28!
A better approach would be to note that the asymptotic growth of high order moments
is dominated by the 7 of largest magnitude. If the moments are normalized by the largest
time constant then this growth will be curtailed. Furthermore, a good estimate for this time
constant would be lrnk/rnk-l I where rnk is the highest order moment computed. When this
is done the moments become:
{ mo,... ms} = {O,O,
1.2 X 10-17, -2.4 x 10-3, 4.1 x 1W3, -4.1 X lo-3}
(4.17)
Note that the high order moments are almost identical and the the ratio of the magnitudes
of the largest and smallest (now 14 orders of magnitude) is better than before the frequency
scaling. This procedure has been found to be useful for a large body of simulations.
Finally, note that the example given above is a rare example for which none of the
frequency scaling procedures described above was able to condition the moment matrix
sufficiently for it to be inverted and a waveform approximation produced. This prompted
us to consider whether or not it was possible to do any better. In fact, it turns out that it is
possible to find a scaling factor that is “optimum” in the sense that it minimizes the ratio of
CHAPTER 4. WAVEFORM APPROXIMATION
58
the magnitudes of the largest and smallest moments (See Appendix D). When the optimum
scaling factor is used:
{ mo,... ms} = {0,0,2.6
x
10-27, -7.4 x lo-la, 1.8 x 10-22, -2.6 x lo-“} (4.18)
the ratio of the largest to smallest moments drops to just 9 orders of magnitude, the moment
matrix could be inverted, and a waveform approximation could be produced. Unfortunately,
the procedure appears to be too expensive to justify its use. For almost all other waveforms
the improvement over using one of the heuristics was negligible and optimal frequency
scaling increased the execution time of the program by up to 10%.
4.4 Demonstration
In order to evaluate how well moments matching works for our application, a variety of ring
oscillators were simulated. For each circuit, three different simulations were performed.
First, SPICE was run using its nonlinear transistor models. Second, Mom was run using
each of the piecewise linear models described in Chapter 3. Third, SPICE was run using
Mom’s piecewise linear models. For each run the period of oscillation was recorded. For
each circuit a plot was made overlaying the responses of all three simulations. Table 4.2
presents some statistics gathered from the simulations. The first three columns give the
Pole Distribution
% 1 Pole % 2 Poles % 3 Poles
0
0
100.0
CMOS0 Inverter Ring
CMOS0 NAND Ring
0
26.2
-I
73.8
1.8
90.7
7.5
CMOS 1 Inverter Ring
0.1
83.2
16.7
CMOS 1 NAND Ring
43.4
31.3
25.2
BJlTl ECL Ring 10%
0
46.1
53.9
BIT0 ECL Ring 20%
67.8
20.2
11.8
BIT1 ECL Ring
75.6
22.3
2.2
BJT2 ECL Ring CapLevels=l
Model Waveform
Error
Error
1.2
0.8
0.3
28.1
1.1
0.0
5.2
0.1
9.2
0.3
9.2
6.3
8.0
0.1
2.5
0.3
Table 4.2: Benchmark Statistics.
fraction of 1, 2, and 3 pole responses produced by Mom. The last two attempt to identify
the origin of simulation errors.
CHAPTER 4. WAVEFORM APPROXIMATION
59
The column labeled “Model Error” gives the error of the period of the SPICE piecewise
linear simulation relative to the SPICE nonlinear simulation. “Model Error” is the error
incurred by modeling nonlinear SPICE transistors using the simple piecewise linear models.
The column labeled “Waveform Error” gives the error of the period predicted by Mom
relative to the SPICE piecewise linear simulation. “Waveform Error” is the error produced
by the moments matching waveform approximation procedure. In principle the total error
of our simulator could be the sum of these two errors. In practice, as will be seen in
Chapter 7, the errors can randomly sum or cancel.
The circuit “CMOS0 Inverter Ring” is a 5 stage CMOS ring oscillator using inverters
composed of piecewise linear MOS Level-O transistors (Figure 4.6 (a)).
Because of
v
6
- SPICE Nonlinear
5
4
3
2
1
0
d
-l ,
_
1
_
i
-3-
h.
(a) CMOS0 Inverter Ring
5‘*
_
i
-2
.3
-
4.
-
5-
-
&
nS time
nS time
(b) CMOS0 NAND Ring
Figure 4.6: CMOS Level-O Ring: SPICE vs PWL vs Mom.
the simplicity of the piecewise linear model and circuit (every cluster reduces to just an
RC) the piecewise linear response is exactly single time constant. Thus the Mom and
SPICE piecewise linear simulations match well although the piecewise linear and nonlinear
simulations don’t. This is expected because the switched resistor model can correctly
predict the switching delay of a CMOS gate but not necessarily the waveform. The match
of the periods is a consequence of using this circuit to calibrate the switched resistor model.
Figure 4.6 (b) shows what happens when 2 input NAND gates are used in the ring
(“CMOS0 NAND Ring”). Again note the close match between the Mom and SPICE
CHAPTER 4. WAVEFORM APPROXIMATION
60
piecewise linear simulations. However, the error due to the switched resistor model has
increased to 28%. This is not surprising because it has already been observed that the use
of switched resistors to model series stacks of velocity saturated devices can lead to errors
of the order of 29% (Figure 3.15). Also note that the circuit is slightly more complex with
the result that 2 pole approximations are occasionally used for the intermediate node of the
NAND stack.
The circuits “CMOS1 Inverter Ring” and “CMOS1 NAND Ring” (Figure 4.7 (a) and
(b)) illustrate the use of MOS models with gain. The gain couples multiple nodes together
V
6
- SPICE Nonlinea
.--. SPICE Piecewise Linear
- - - Mom
/
5
\‘i
3
2
1
0
-16
- 1-
2
-
3.
.~
_
5 -1,
nS time
(a) CMOS 1 Inverter Ping
(b) CMOS 1 NAND Ring
Figure 4.7: CMOS Level-l Ring: SPICE vs PWL vs Mom.
thereby increasing the complexity of the response. Most of the waveforms are now two
poles and the match with SPICE is much improved.
The circuits “BJTO ECL Ring 10%” and “BI’IQ ECL Ring 20%” (Figure 4.8 (a) and
(b)) are 9 stage ECL ring oscillators using the piecewise linear Level-O bipolar model.
When an error threshold of 10% was used, the SPICE and Mom responses match almost
exactly. However the 0.3% error of the waveform approximation is swamped by the 9.2%
error of the piecewise linear model. Perhaps a better balance between speed and accuracy
is achieved by increasing the error tolerance to 20%. The resulting elimination of 3 pole
responses can substantially speed up the simulation, albeit at the expense of increasing the
waveform approximation error to 6.3%
CHAPTER 4. WAVEFORM APPROXIMATION
0.1
V
0.0
--__----____
- SPICE Nonlinear
I
0.1
V
0.0
-0.1
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
-0.4
-0.5
-0.5
(a) BIT0 ECL Ring 10%
61
_
_
_
- SPICE Nonlinear
-
-
-
-
-
_
I
-I-
(b) BIT0 ECL Ring 20%
Figure 4.8: ECL Level-O Ring: SPICE vs PWL vs Mom.
Finally, the ECL ring oscillator was simulated with the Level-l and Level-2 bipolar
models (Figure 4.9). “BIT1 ECL Ring” employs the piecewise linear Level-l bipolar
model, and “BIT2 ECL Ring Caplevels=l” employs the piecewise linear Level-2 bipolar
model. (The meaning of “Caplevels=l” will be explained in the following chapter.) Note
that for these circuits the approximation error threshold had to be reduced from its default
value of 10% to 2% in order to achieve the degree of matching depicted in the figures and
table.
From this set of simulations a number of observations can be made. In general as the
complexity of the models and circuits increases, so does the number of poles needed to represent the response. Both gain and floating capacitors tend to couple together multiple nodes
thereby complicating the response and requiring greater numbers of poles. Additionally, as
the models become more complex, the error threshold should be reduced commensurately.
It is senseless to pay for a low waveform approximation error if it is going to be swamped
by the modeling error, and conversely.
Overall, third order waveform approximations appear to be adequate. Note that the
error due to waveform approximation can almost always be made smaller than the modeling
error. Additionally, the piecewise linear models do quite well despite their simplicity. The
switching delay errors due to the Level-2 bipolar and the Level-l MOS models are under
_
CHAPTER 4. WAVEFORM APPROXIMATION
0.1 -SPICENonlinear
V
0.c - - - SPICE PW
/-
-’
62
.
_
- SPICE Nonlinear
-
-0.1
-
-I-
-
.
_.-
_
-,
-
_.
_
_
-0.2
-0.3
-0.4
-0.5
-0.6
-0.5
d
_
.
2
-
-
s
_
-8.
_
si
_
-Ib-,-j
nS time
(a) BIT1 ECL Ring
2
;i
-
.6.
-
.*.
.lb
nS time
(b) BIT2 ECL Ring Caplevels=l
Figure 4.9: ECL Level-l & -2 Rings: SPICE vs PWL vs Mom.
6%.
4.5 Limitations
4.5.1 Unstable Responses
It is apparently more difficult to apply the waveform approximation technique to unstable
circuits than to stable circuits. One reason for this is that the Pad6 approximation tends to
capture the low frequency behavior of the response. That is $I( s) first converges to H(s) in
the vicinity of the origin of the complex s plane[Hua90]. For stable circuits this behavior
is desirable because the low frequency poles tend to dominate the behavior of the response
(hence the term dominunt time constant). Unfortunately, for unstable circuits the response
is dominated by the unstable poles which may not be the lowest frequency poles. In that
case the most important poles may be approximated with the greatest error.
This poor approximation of unstable poles is exacerbated by the exponential growth
with time of the approximation error: v(t) - G(t) of an unstable system. In the worst case,
this increase of the error with time can actually lead to a misprediction of not only the
timing behavior of a circuit but also its logical behavior. In contrast, for a stable system
CHAPTER 4. WAVEFORM APPROXIMATION
63
both the response and its estimate decay exponentially with time. Consequently for a stable
system we are always assured that the error decays exponentially, and that voltages always
eventually converge to their correct final values.
To illustrate, consider the response of node emit of the Schmitt trigger (Figure 4.3)
when X? first turns on. The complete third order response may be compared to the first and
second or&r approximations:
v(t) = 0.230e-‘/*Va@-+
G1 (t) =
_ o.~o~~-W~P~
0Jj5()e-‘/1*368p”
G2( t) = o.oofjef/158p~ + 0.230~--@~~~~
+ 0.)07~w3*~~
(4.19)
(4.20)
(4.21)
According to our metric the first order estimate appears to be adequate. In fact, the
first seven moments match to within 14%. For stable systems such a match usually
implies an acceptable approximation. However the first order approximation is missing
the crucial unstable pole. Figure 4.10 (a) shows that the use of this estimate leads to an
incorrect logical simulation. The lack of an unstable pole causes emit (the lowest trace) to
erroneously continue falling when 72 turns on (at t cv 2ns). In contrast out (initially the
middle trace) correctly begins to rise exponentially. The result is that TI never turns off
and out veers wildly off to +KL
The first order approximation is readily rejected because it has been determined that the
circuit is unstable and yet the estimated response lacks an unstable pole. The second order
approximation does possess an unstable pole. Furthermore the time constant and coefficient
of its dominant pole is accurate to within 3 significant figures and its first seven moments
match to within .009%. However, the use of the second order approximation still shows
some noticeable errors (Figure 4.10 (b)). Only when the error tolerances are tightened to
force the use of the full 3 pole response do we get a good match with SPICE (Figure 4.11).
4.5.2
Circuits with High Gain
Another limitation arises when circuits with extremely high gain place extreme demands
on the accuracy of the waveform approximation. The CMOS NAND gate described in
CHAPTER 4. WAVEFORM APPROXIMATION
2
;i
_
6-
_
-8
Ib
64
-2
nS time
(a) 1 Pole Approx of emit
(b) 2 Pole Approx of emit
Figure 4.10: Simulation of Schmitt Trigger
I
_---
_
-1
_
I
L---
,
-L----J
-,
-
-
/:
j+q
I
~
I
Figure 4.11: Schmitt Trigger using 3 Pole Response for emit
65
CHAPTER 4. WAVEFORM APPROXIMATION
Section 4.3.5 is an instance of such a circuit. A more extreme example is the cascade of
ECL inverters in Figure 4.12 where all stages are initially biased into their linear regions
VCC
VCC
VCC
VCC
Figure 4.12: Cascade of ECL Inverters
(6 = v, = v, = Kj = l&f = -250mV) and in rises from -250mV to Ov. For the initial
segment of the response, all the inverters remain in the linear region. Since each stage has
a gain of approximately 5 the initial input signal swing of 250mV gets amplified to about
156 volts by the last stage.
Figure 4.13 compares the initial segment of Vd predicted by Mom with that generated
0.2
KV
V
0.1
C
0.0
-0.1
-
-.
_
_,_
2
_
(a) 200 Volt scale
_
_
- 4 -10
nS time
2
4
nS time
(b) 10 Volt Scale
Figure 4.13: Initial Segment in the Response of ECL Cascade.
by SPICE. Our simulator’s response consists of 3 poles with an approximation error of
.25%. On a scale of 200 volts (a) the approximate response looks quite good. However
on a scale of 10 volts (b) it is apparent that the approximation is useless. In the region of
CHAPTER 4. WAVEFORM APPROXIMATION
66
interest (i.e. near t=O) the approximation doesn’t even move in the correct direction. The
problem is that the asymptotic swing of the segment is much larger than swings that can
actually occur in the circuit. Thus, the required approximation error must be vanishingly
small. Unfortunately, in this case the approximation is already composed of three poles,
and our simulator can do no better.
This example is admittedly artiG3l because digital circuits are extremely non-linear
thus making it unlikely that many successive logic stages will be simultaneously biased
into their high gain regions. In fact, this situation arose because of a peculiarity of the
initial DC solution; the perfect symmetry of the ring caused the simulation to start from the
metastable state. Although the ring oscillator’s problem can be solved by modifying the
initial DC solution, it is possible that circuits similar to the CMOS NAND gate mentioned
above will require a more general solution.
4.6 Summary
Piecewise linear models require more sophisticated waveform approximation procedures
because as the complexity of the transistor models increases so does the complexity of
the responses of circuits using those models. Therefore, Mom uses a general moments
matching procedure to generate higher order waveform estimates. The procedure involves
modeling an actual high order system using a low order system chosen such that low order
moments of the impulse responses of both systems match. Because the response of the low
order system can be computed exactly it is used to approximate the response of the high
order system.
Practical implementations of this procedure must address a number of issues. Floating
capacitors occasionally lead to problems with low order approximations, roundoff errors
in the moments ultimately limit the order of the approximation, approximations can sometimes contain unstable defective poles, and the cost of generating an approximation rises
superlinearly with the number of poles. While these issues need to be addressed, practical
compromises can deal with them.
Overall, the procedure works well. An examination of a number of CMOS and ECL
circuits indicates that 3 poles are usually adequate. In fact, comparisons with SPICE
CHAPTER 4. WAVEFORM APPROXIMATZON
67
simulations indicate that the waveform approximation works so well that the errors due
to waveform approximation are almost always swamped by the piecewise linear modeling
errors.
A couple of limitations persist. First, it is much more difficult to simulate unstable
circuits because unstable poles frequently have small coefficients and time constants and
hence contribute minimally to the moments. The result is that small errors in matching the
moments generate large errors in the time domain response. Second, for brief moments
during a switching transient certain combinations of device states can yield circuits with
voltage swings that far exceed what is physically possible. The consequence is that only an
infinitesimal initial portion of the waveform approximation is used, thereby placing extreme
demands on the accuracy of the approximation.
Chapter 5
Moment Computation
In the previous chapter the general moments matching procedure was introduced and shown
to be useful for predicting the responses of circuits containing piecewise linear models. This
chapter is concerned with one particular “implementation detail” of the moments matching
procedure: the computation of moments from the circuit. Moment computation is carefully
considered here because in previous work[PR90] it tended to dominate the overall cost
of waveform approximation. One of the factors that made moments matching attractive
for MOS timing analysis and switch-level simulation was the relative ease with which
moments could be computed from the circuit. When the switched-resistor transistor model
was used, the task of computing moments reduced to finding the DC solution of a resistor
tree, something that was readily done in linear time without formulating or LU factoring
general sparse circuit matricies.
However, when piecewise linear transistor models are allowed, the simulator must deal
with circuits that are no longer RC trees. This chapter generalizes RC tree analysis along
two dimensions. First, RC tree analysis is extended to apply to piecewise linear transistor
models. This generalization retains the efficiency of RC tree analysis for the transistorcapacitor trees found in MOS circuits and the current steering trees found in ECL circuits.
Second, circuit tearing is used to handle non-tree topologies and feedback. If the number
of branches that need to be tom in order to get a feedback-free tree is small and bounded,
then even these more complex circuits can be analyzed efficiently.
68
.
CHAPTER 5. MOMENT COMPUTATION
69
Finally, this chapter addresses the efficiency problem caused by allowing floating capacitors in transistor models. Floating capacitors can potentially couple together all nodes
in the circuit thereby eliminating the ability of the simulator to take advantage of latency.
From a practical standpoint, however, this isn’t a problem because the circuit can be repartioned by simply ignoring coupling through all but a limited number of levels of floating
capacitors.
5.1 Background
Many methods have been proposed for the computation of moments. Because of their
emphasis on efficiency, switch-level simulators have historically attempted to take advantage of the tree-like topology of most digital MOS circuits. Initially, only grounded
capacitors were considered and heuristics were used to break loops so that tree analysis
could be applied in linear time[Hor83]. Raghunathan and Thompson[RT85a] and Chu and
Horowitz[Chu88] extended tree analysis to handle leaky trees, multiple drivers, and charge
sharing while retaining the efficiency of RC tree analysis. Chan[Cha88] extended RC tree
analysis to handle floating capacitors.
A number of derivatives of tree analysis have been proposed to handle circuits that are
nearly trees. Although these procedures do not have linear complexity, if the number of
loops is small, they can be nearly as efficient. Lin and Mead[LM84] proposed an algorithm
based on Gauss-Seidel relaxation. Chan and Karplus[CKSB] and Pillage and Dutta[PD90]
handle edges closing loops in the tree using brunch tearing. Ratzlaff et al.[RGP91] utilize
a procedure that can be viewed as the application of node tearing techniques.’
General purpose circuit analysis techniques have also been applied to moment computation. Shi and Zhang[SZ87] formulate the problem in terms of nodal analysis, thereby
removing the topological restrictions of tree analysis while allowing independent sources,
dependent sources, and floating capacitors. Pillage and Rohrer[PR90] use tree link unufysis to compute moments for networks that may additionally include inductors. However,
‘Although Ratzlaff et al. do not present their work in terms of node tearing, we will justify this point of
view later.
CHAPTER 5. MOMENT COMPUTATION
70
although these techniques are more general, they may not be as efficient as RC tree analysis for the particular case of RC trees. Pillage and Dutta point out that when tree-link
analysis is applied to RC trees the loop/cutset matrix may not be sparse thus leading to
super-linear complexity[PD!N]. Furthermore, although properly ordered nodal equations
for trees can be LU factored in linear time due to the absence of fill-ins[SZ87], it may still
be better to utilize an alternate formulation. While studying the DC solution of power nets,
Branin[Bra80] found that his tree-based method could solve the network in about the time it
took simply to formulate the sparse nodal equations. Ratzlaff et al. found that their hybrid
tree/nodal analysis was up to two orders of magnitude faster than general LU factorization.
Since our goal is to duplicate the efficiency of switch-level simulators, we have chosen
to generalize RC tree analysis. Our analysis is an extension of that of Chu. We first review
his analysis in the next section.
5.2
Moment Computation for Leaky Resistor Trees
Chu[Chu88] extended RC tree analysis to include RC trees driven by multiple sources
(Figure 5.1).
These topologies may continue to be viewed as trees if one redefines
"I
Figure 5.1: Leaky resistor tree.
leaf nodes to be nodes connected to one transistor terminal and either a grounded current
or voltage source.’ The root node is arbitrarily selected from the non-leaf nodes (See
Figure 5.2). We will refer to such topologies as a leaky frees.
To review Chu’s approach: consider the leaky tree in Figure 5.2 that results when the
*We include nodes connected to current sources with zero current.
‘Chu’s thesis describes the analysis in terms of “moving capacitors”. We present his work from the slightly
different perspective of Norton analysis applied to the network.
CHAP7-ER 5. MOMENT COMPUTATION
71
r
;
+
Figure 5.2: Computing moments for leaky tree.
capacitors are replaced by current sources for the purpose of computing moments. The
circuit has been redrawn to suggest its tree structure, with the leaf nodes nl, n4, and n5 at
the bottom and the root node n2 at the top. The DC solution can be found by making two
passes over the network. The first pass starts at the leaves of the tree and ascends to the
root. The second pass starts at the root of the tree and descends to the leaves.
We begin the first pass with the resistors connected to leaf nodes (~1, 7-3, and 7-q). For
each resistor we compute the Norton equivalent seen looking into its upper terminal. Once
we have computed the Norton equivalents of all resistors descending from a particular
node (after the first iteration n3 becomes such a node) we can combine them with the
capacitor current sources to produce the Norton equivalent seen looking out of the lower
terminal of the (single) resistor ascending from that node (for n3 this is ~2). We record this
aggregate Norton equivalent at the node for use in the second pass. The first pass continues
iteratively, replacing each ascending resistor by the Norton equivalent seen looking into
its upper terminal and, in turn, computing the Norton equivalent seen by the parent node’s
ascending resistor. The iteration terminates when we have computed the Norton equivalent
of all the resistors descending from the root node.
The second pass starts by solving for the voltage at the root node. This is possible
because the root node has no resistors ascending from it and in the first pass we saved for
each node the Norton equivalent of all resistors (and capacitor current sources) descending
CHAPTER 5. MOMENT COMPUTATloN
72
from that node. Once we know the root’s voltage, we can solve for the voltage of its
children by utilizing the Norton equivalent saved at each child. We continue descending
the tree until we’ve solved for all the voltages.
In summary, the process utilizes two mapping computations. In the first pass (See
Figure 5.3) we are given q , ii, and T and need to find 7-z and i2
r2. i2
a
"2
r
"1
'1
T
il
h
Figure 5.3: Norton calculations.
= rl+r
7-1
i2 = ilrl +r
r2
(5.1)
(5.2)
In the second pass we are given, in addition, 29 and we need to find q
01 =
v27j + ilqr
r1 +r
(5.3)
Later, we will show that these two computations can still be performed even if resistor r is
replaced by a transistor modeled using piecewise linear functions.
Finally, it should be noted that one of the factors contributing to the efficiency of tree
analysis is that the formulation of the equations occurs largely as a side effect of the process
of deciding which nodes are affected by the switching event. A cluster consists of all nodes
connected by the channels of conducting transistors. Therefore, when a transistor switches,
a depth first search is performed on the interconnection graph4 along only those edges
representing conducting transistors. The construction of a list of those edges in the order
4A static interconnection graph of the network is constructed once during a preprocessing phase.
CHAPTER 5. MOMENT COMPUTATION
73
they are encountered, plus a bit in each edge indicating which of its two channel terminals
is higher in the tree, constitutes the formulation of the equations.
5.3
Leaky Trees of Three Terminal Networks
The switched resistor model is particularly easy to deal with because of the simplicity of
the circuit models that represent its behavior in each region of linearity: the transistor is
represented by either a resistor (if it is on) or an open circuit (if it is off). Such simple circuits
are not always sufficient. For example, to model the dependence of the drain current on the
gate source voltage, the MOS Level-l transistor model must employ a dependent current
source. In general, we would like to allow interconnections of resistors and dependent and
independent current and voltage sources in transistor models. Once this is done it is no
longer apparent that a simple tree walk can be used to find the moments.
This section demonstrates the rather surprising result that trees of piecewise linear
devices can be solved as easily as trees of resistors. This general result has only two minor
restrictions: the coupling from the gate (base) to the source and drain (emitter and collector)
must be unidirectional, and the tree must be feedback free: that is no gate (base) of any
transistor in the tree may be connected to any node in the same tree.
In order to compute the moments of a transistor-capacitor tree we need to find the
DC solution of a corresponding tree obtained by setting independent DC sources to zero,
replacing inductors with voltage sources, and replacing capacitors with current sources.
Furthermore, because we assume inputs are unidirectional, MOS gates are considered to
be driven by independent, possibly exponentially time varying voltage sources. It can be
shown that when formulating the circuit to compute the (Ic + 1)st moments, each timevarying source should be replaced by a DC source set equal to the (Ic + 1)st moment of its
waveform (see Figure 5.4).
In any particular state, each piecewise linear device is equivalent to some linear network.
Assuming that each transistor remains in its present state for some finite amount of time,
we group each transistor with the voltage source driving its gate and represent the interface
that the pair presents to the network by the short-circuit admittance parameters of a three
The parameters are defined by extracting two
terminal network[BS65] (Figure 5.5).
CHAPTER 5. MOMENT COMPUTATION
74
Figure 5.4: Transistor-capacitor tree.
Figure 5.5: Three terminal network model.
voltage ports, one from each terminal to ground[CL75].5
(5.4)
Figure 5.6 gives a physical interpretation of the six parameters of the admittance formulation.
Figure 5.6: Circuit interpretation of admittance parameters.
In order to find the DC solution of leaky trees of these networks6 we need to be able to
do two things. If we replace resistor r in Figure 5.3 by the circuit in Figure 5.6 it can be
‘It is not always possible to extract two voltage ports for a particular model. For example, admittance
parameters cannot be determined for a voltage source. Such devices are handled as special cases. However,
to simplify the discussion we assume that the admittance representation exists.
‘jA special case arises if either ~12 = 0 or yrl = 0. In that case rather than treating the transistor as a
branch in a tree it is potentially more efficient to treat it as an arc between clusters. See Section 5.5
CHAPTER 5. MOMENT COMPUTATION
75
shown that if gi = l/r1 and ii are known then 92 = l/r* and i2 are given by
i2
92
1
=
Y12
- id +
=
Y11-
Y22 + 91
Yl2Y21
(il - is*)
I
yu+91
(5.5)
(5.6)
If, in addition, v2 is known then vi is given by
v,
=
~21~
91 + $52
il -
is2 -
(5.7)
Thus it is possible to generalize the DC analysis of leaky trees of resistors to leaky trees
of three terminal networks. Because the above equations take a constant amount of time
to compute, the leaky tree analysis remains O(n) in the number of devices in the circuit
irrespective of the complexity of the models.
5.4 Series-Parallel Combination
The analogy with resistors goes even further. These three terminal networks are also
amenable to series-parallel combination. The admittance parameters of the parallel combination of two networks can be found by simply ‘summing their corresponding parameters.
The parameters of a series combination can be derived from the series combination of two
of the circuits in Figure 5.6. If we let superscripts of 1 and 2 distinguish between the
parameters of the two circuits then
Y:2Y:l
Yll
=
Y:l - 1
Y22 + Yfl
Yl2
=
-
Y21
=
-
Yi2YT2
Yi2 +
Yfl
Y:, Y221
Yi2 + Yfl
Y&Y:2
(5.8)
(5.9)
(5.10)
Y22 = Yi2 - 1
Y22 + Yfl
(5.11)
i,l = it, - y;2$y;l (ii2 + Cl)
(5.12)
.2
is2 = 232
- y;2fl y:l (ii2 + Cl)
(5.13)
CHAPTER 5. MOMENT COMPUTATION
76
Parallel combination can be useful for breaking small loops produced, for example, by
CMOS transmission gates. Although such loops can be handled using the more general
tearing techniques described later in the chapter, parallel combination is more efficient.
5.5 Coupled Clusters
The previous section showed how to compute the moments of a single cluster assuming
that all its inputs are known. For Rsim this is sufficient because the switched resistor
model always yields clusters that are independent. Mom’s more general models, however,
may include coupling between clusters which requires that multiple clusters be analyzed
simultaneously.
Switch-level simulators usually exclude floating capacitors. However, Mom treats a
floating capacitor as a bidirectionuf coupling which requires the simultaneous evaluation
of both terminals. As outlined in Section 4.2 the computation of the moments proceeds by
replacing capacitors with current sources. However, instead of inserting a single flouting
current source we insert two grounded current sources (Figure 5.7). When computing the
I,
P
1
C
m
C( M --M
Pk
mk
1
C(Mppmk)
Figure 5.7: Floating capacitor connecting same cluster.
(k + 1)st moments, the current of an inserted current source becomes
i, = C(M,, - Mmk)
(5.14)
where Mpk and I’$,~ represent the kth moments of voltages of the nodes connected to the
plus and minus capacitor terminals, respectively.
If both terminals are connected to the same cluster then the moment computation
proceeds as for grounded capacitors. However, if the capacitor links two otherwise disconnected clusters (Figure 5.8) then the moments for both clusters must be computed in
CHAPTER 5. MOMENT COM~LJTATZ~N
77
Figure 5.8: Capacitive Coupling Between Clusters.
lock step because the (k + 1)st moments in each cluster depend upon the capacitor current
which, in turn, is a function of the kth moments of nodes in both clusters. Thus the 0th
moments are computed for Cluster 1 and Cluster 2, followed by the 1 st moments for Cluster
1 and Cluster 2, etc.
Coupling between clusters can also be caused by dependent sources in the transistor
models. To see this, consider the circuit in Figure 5.9. If 72 is modeled by the switched
Figure 5.9: Gate to Channel Coupling.
resistor model then Clusters 1 and 2 are independent. Switching events in Cluster 1 (for
example TI changing regions) which affect waveforms in Cluster 1 (for example nl) won’t
affect waveforms in Cluster 2 unless they cause T2 to change regions. However, consider
what happens if T2 is our MOS Level-l model biased into its saturation region. In that
case 7’2’s drain current will be a continuous function of of T2’s gate voltage. In that case
any changes to nl’s waveform will immediately affect waveforms in Cluster 2 even if 7Z
doesn’t change regions.
Thus if 72’s model includes a dependent source coupling its source and drain currents
to its gate voltage, then any event that requires Mom to recompute the response of Cluster
1 (including its moments) also requires Mom to recompute the response of Cluster 2
(including its moments). This implies that not only must the moments be computed in lock
step, but the kth moments of Cluster 1 must be computed before the kth moments of Cluster
CHAPTER 5. MOMENT COMPUTATZON
78
2. This unidirectionul coupling is represented by an arc between the clusters.
Another example of unidirectional coupling is illustrated by our bipolar model. Similar
to the MOS example, the emitter and collector currents are dependent upon the base
voltage. However, while the collector current depends upon the emitter voltage, the emitter
current is independent of the collector voltage. This can be represented by an additional
unidirectional coupling from the emitter to the collector (Figure 5.10). Thus we represent
Figure 5.10: Multiple Couplings for Bipolar Transistor.
a bipolar transistor with three arcs.
Note that for this particular model the “channel” of the bipolar transistor (the emittercollector path) is represented by a unidirectional arc rather than a bidirectional edge.
Therefore the base, emitter, and collector nodes may all reside in different clusters. In
general, if either ~12 or yzi of a device model (See Equation 5.4) are equal to zero it can
be treated as a unidirectional coupling between possibly different clusters rather than as a
branch that couples the two nodes into the same cluster. Later we shall see this allows a
potentially more efficient evaluation.
Coupled clusters are collected into a group and represented by a directed graph.7 The
moments of clusters in a group must be computed in lock step, and the arcs between clusters
induce an ordering for the computation of each moment. If the directed graph is cycle free
then the correct order for the evaluation of moments can be found via a topological sort of
‘Note that while a cluster’s graph is, by definition, connected, a group’s directed graph may not be.
CHAPTER 5. MOMENT COMPUTATION
79
the graph.* In this case the group’s moments can be computed in linear time. If the directed
graph has cycles then the group has feedback and no topological sort exists. Feedback is
handled using tearing as described in a following section.
Figure 5.11 depicts a topological sort of the directed graph of an ECL gate. Note that
5
6
Figure 5.11: Topological Sort of ECL Gate.
since the bipolar model contains no edges, each node in the ECL gate is an independent
cluster. The figure suggests that the directed graphs of most ECL current steering trees will,
in fact, be cycle free. Thus it is possible to compute the moments of most ECL gates in
linear time. In contrast, if the bipolar transistor were treated as an edge rather than an arc
(Figure 5.12 shows what happens when the bipolar transistor model is a resistor.) then the
resulting undirected graph would have a loop and it wouldn’t be possible to compute its
moments in linear time. This elimination of loops from the graphs of ECL gates was the
primary motivation behind neglecting the output conductance of bipolar transistors.
*A topological sort can be performed by depthfirst search with complexity O(n) [AHUSS].
CELWTER 5. MOMENT COMPUTATION
80
4
Figure 5.12: Resistor Model for Bipolar Transistor Results in Loops in ECL Gates
5.6 Relaxing Network Restrictions
In the preceding section a method for finding the DC solution (and hence the moments) of
(possibly multiple coupled) trees of piecewise linear devices was described. That method
has the nice property that its complexity grows linearly with the size of the circuit. However
two restrictions were placed on the circuit: the edge graph must be loop free, and the cluster
graph must be cycle free. Because of the judicious choice of transistor models these
restrictions are seldom a problem. Most CMOS circuits yield leaky trees, and most ECL
circuits yield directed acyclic graphs. Thus it is possible to compute the moments of the
most common circuits in linear time. However, one occasionally encounters circuits that
contain either loops or feedback. For those circuits the techniques described cannot be used
directly. In this section we shall describe the use of tearing to solve those circuits. Although
the complexity of tearing is superlinear, if the circuits that need to be tom are small, and if
there aren’t many of them, tearing can still yield an efficient solution technique.
One example of a circuit with feedback is the Schmitt trigger in Figure 5.13. The
CHAI’TER 5. MOMENT COMPUTA77ON
81
Figure 5.13: Schmitt Trigger
Schmitt trigger has two stable states. If in is high then TI is on and our is low. Conversely,
if in is low then 72 is on and ouc is high. In either state only one transistor is on. However,
consider what happens when in falls from high to low. TZ is initially the only transistor
on but soon T2 turns on as well. When it does, a positive feedback loop is established
through both transistors that eventually causes TZ to turn off. Thus during the transition
a cycle exists in the directed graph. As another example, consider the diode decoder in
Figure 5.14. When the decoder’s inputs x2 - x0 are low, i2 - i0 receive current and i2 - a
I I
I1
il
T
i0
1
A
I I
I I
I I
I I
II
1
1I
1I
1
1
A
II
xl
1
1
I
77
II
0
0
il
Vr
L
v-
Figure 5.14: Diode Decoder.
don’t receive current from the differential pairs. Only diodes connected to wires receiving
82
C’HAP’IER 5. MOMENT COMPUTATION
current (marked with an asterisk (*)) are on. Since y0 is the only output whose diodes are
all off; only it is high. All other outputs are pulled low by one or more on diodes. However,
it is not trivial to determine the distribution of currents through the diode network. This is
because the topology of the on diodes forms a mesh rather than a tree.
If the circuits containing loops and/or feedback are small and if there aren’t many of
them, it can be practical to handle jlcst those circuits using a more general, albeit less
efficient, extension of the original algorithm. In this section we will show that the circuit
decomposition technique known as rearing can be used to handle such circuits.
5.6.1
Node and Branch Tearing
Circuit tearing was originally introduced by Kron[Kro39] who described the solution of
large networks by 1) partitioning them into multiple subnetworks 2) solving the subnetworks independently and 3) combining the independent solutions to produce the overall
solution. Two particularly interesting techniques have been devised for tearing systems
of nodal equations: branch teoring[Wu76] and node rearing[SVCC77]. Branch tearing
can be interpreted as9 the insertion of independent current sources in series with “tom”
branches in order to partition the network (Figure 5.15). In contrast, node tearing can
l&j
- --Be-
1
++.- ++/IT-fl!Fj
"1
"2
il
i2
--
Figure 5.15: Circuit Partitioning via Branch Tearing.
be interpreted as the insertion of independent voltage sources between “tom” nodes and
ground in order to partition the network (Figure 5.16). For each of our examples the
insertion of two independent sources partitions the network into three subnetworks. The
9The intuition behind these interpretations of branch and node tearing are largely due to an insightful paper
by RohrerlRoh881. However, our interpretation of node tearing differs from the one presented there.
CHAPTER 5. MOMENT COMPUTATION
83
--
1,-
--
Figure 5.16: Circuit Partitioning via Node Tearing.
network is considered to be “partitioned” because each of the subnetworks may be solved
independently once the values of the inserted sources are given.
The solution of the original network can be found using multiple solutions of the
subnetworks. To illustrate, consider the case of branch tearing. First the inserted current
sources, ii and i2, are set to zero and the subnetworks are solved to find the voltages across
the inserted sources vi and 74. Denote the resulting voltage across the kth inserted source
by vok, that is the response due to sources that were part of the original network. Next,
set all sources (original and inserted) to zero. Then for each of the inserted sources, set
only that source (let’s say it is the lath source) to some nonzero constant i,, and solve the
subnetworks for the voltages across each of the inserted current sources. Denote the ratio
of the voltage across the jth inserted source to i,, by the transfer resistance rjk. Then by
superposition, the total response due to the original sources and with arbitrary settings for
the inserted sources is given by (assuming n inserted sources, and vt, is the total voltage
across the jth source)
Ql
%
‘Q,
rll
=
r21
..
.
rni
?-I2 * **
Tl?l
r22 - - -
r2n
..
.
fn2
..
.
*
**
.
2a1
i,,
.
.
rnn
aan
V 01
+
v%
(5.15)
van
If we set vt = 0 in the above equation and solve for i, we get the actual currents flowing
through the tom wires of the original circuit (ie before augmentation). Finally, if we solve
the augmented circuit with those currents we get the DC solution of the rest of the original
circuit.
The node tearing case is solved in a similar manner, except that instead of setting the
values of the inserted current sources the values of the inserted voltages sources are set,
CHAPTER 5. MOMENT COMPUTATION
84
instead of measuring the voltages across the inserted current sources the currents through
the inserted voltage sources are measured, and instead of forming a transfer resistance
matrix a transfer conductance matrix is formed.
5.6.2 Non-Leaky The Topologies
Node and branch tearing were conceived with the objective of partitioning the network
into multiple independent pieces. However, they can also be used to reduce a network to
one that is more readily solved. For example, consider a circuit that is nearly a leaky tree
(Figure 5.17). The network can be reduced to a leaky tree by finding a spanning tree for
Figure 5.17: Circuit That is Nearly a Leaky Tree.
the network and then using branch tearing to tear out all edges not part of the spanning tree
(Figure 5.18). The equivalence of the tom circuit to a leaky tree can be made more apparent
+ "2-
Q
+
i2
I
-3'
+
-
"1
(4
(b)
Figure 5.18: Branch Tearing of Nearly Leaky Tree.
by replicating the tearing sources. In a similar manner node tearing can be used to break
CHAPTER 5. MOMENT COMPUTATION
85
(4
Figure 5.19: Node Tearing of Nearly Leaky Tree.
loops in the circuit (Figure 5.19). In either case, the tom circuit is a leaky tree and hence
can be solved in linear time. Of course tearing doesn’t really eliminate the work of solving
the network. In order to compute the response of the original network from the solution
of the tom network a transfer resistance or conductance matrix of dimensionality equal to
the number of tom nodes or branches must be formulated and LU factored, a process that
is generally superlinear. In general, tearing techniques are not guaranteed to reduce the
amount of computation and can actually increase it[Wu76]. However, the experience from
switch-level simulation has been that circuits are usually close approximations of trees and
the number of tom branches or nodes is much smaller than the total number of branches or
nodes in the network. If the number of tom branches or nodes is small and bounded then
tearing can be used advantageously.
Chan and Karplus[CK89] and Pillage and Dutta[PD90] used brunch rearing to reduce
the network to a tree. Ratzlaff et al[RGP91] used a circuit collapsing technique” which
can be viewed as node tearing if one notes that the circuits that are collapsed are simply the
partitioned subnetworks and the nodes that remain after collapsing are the tom nodes. The
choice of tearing nodes is particularly advantageous in that the resulting tearing matrix is
sparse, symmetric, and positive definite and therefore can be more efficiently LU factored
than general circuit matrices.
We use a combination of tearing techniques. Branch tearing is used to reduce individual
‘OThis was an imp rovement of a technique first explored by Stark and Horowitz for the solution of large
power networks[SH90].
CHAPTER 5. MOMENT COMPUTATZON
86
clusters to leaky trees. ii The difference between our approach and that of Chan and
Karplus[CK89] and Pillage and Dutta[PD90] is that we needn’t tear branches to ground.
Although this complicates the formulation of the tearing matrix (we can no longer simply
traverse fundamental loops), the tearing matrix can be smaller and sparser. As demonstrated
in [PD90], tearing branches to ground can severely reduce the spar&y of the tearing matrix.
Feedback is handled using an analogous procedure. Instead of tearing out edges to
eliminate loops in a cluster’s undirected graph, we tear out arcs in order to eliminate cycles
from a group’s directed graph (Figure 5.20 (a)). The only additional complication arises
(‘4
(4
Figure 5.20: Tearing of feedback.
when it is not possible to replace the tom wire with a current source because such a current
source would see an infinite impedance. This occurs, for example, when cluster inputs are
only connected to MOS gates. I2 In this case we tear out the wire and drive the infinite
impedance side with a grounded DC voltage source.
5.7
Partial Evaluation of Floating Capacitive Coupling
The procedures described up to this point can be used to compute the moments of arbitrary
interconnections of piecewise linear devices. If the network is free of loops and feedback
“Branch tearing is actually suboptimal with respect to node tearing in the sense that it may require more
tearing variables. As illustrated by our example, tearing a branch can open up at most one loop while tearing a
node can open up several. A similar result was proved by Sangiovanni-Vincentelli et al.[SVCC77]. However,
in the context of a switch-level simulator, the implementation of branch tearing is slightly simpler.
‘*In fact, presently all our bipolar models also have zero DC base current although this is not actually a
restriction imposed by the simulator.
CHAmR 5. MOMENT COMPUTATION
87
then the moments can be computed in linear time, otherwise the moments are computed
with reduced efficiency. In either case, the moments are computed exucffy (except for the
numerical error introduced by executing the algorithms on a real computer).
However, under certain circumstances it may be necessary to give up trying to compute
the moments exactly. l3 In particular, the inclusion of floating capacitors into device models
introduces an efficiency problem. To illustrate, examine the cascade of three CMOS
inverters shown in Figure 5.21. Assume that the circuit has settled and consider the
in
Figure 5.21: Clusters Coupled by Floating Capacitors.
sequence of events that follows T2 turning on. First note that were it not for the floating
capacitors CZ and C2, nodes nl, n2, and n3 would reside in three different groups. This is
because T3 - T6 are initially biased into either the off or linear regions which exhibit zero
gain from the gate to the source and drain. However CI and C2 couple all three nodes into
the same group. Although the efficiency of moment computation is still 0( 7z) (no loops or
cycles can be introduced by floating capacitors) we must recompute the response of many
more nodes than would seem necessary. Intuitively we would expect that the effect of 7’2
switching upon n2 (coupled through 1 level of floating capacitors) to be quite small relative
to the logic swing, and the effect on n3 (coupled through 2 levels of floating capacitors)
to be negligible. Otherwise the logic gare abstraction would never have been found to be
useful. However, with our present moment computation algorithm, floating capacitors can
potentially couple all nodes in a circuit thereby eliminating the ability to take advantage of
circuit latency.
A simple solution is to allow a group to expand through only a limited number of
levels of floating capacitors. Nodes sufficiently distant (in terms of the number of levels of
13This isn’t so bad. After all the waveforms that are generated from the moments are just approximations.
CHAP7ER 5. MOMENT COMPUTATION
floating capacitors) from the switching event are presumed to be essentially unaffected by
the event. For our example, if the maximum number of levels is set to 1, then only n2 would
be brought into the same group as nl. However, when the expansion process is truncated
there may be some capacitors that only have one terminal belonging to the group. In that
case the terminal outside the group is considered to be driven by a (possibly time varying)
voltage source that has the waveform presently on the node. That waveform is otherwise
undisturbed by the event. For our example, C2 is treated as if its right terminal were driven
by a voltage source having ~3’s present waveform. Node nl retains whatever waveform it
had prior to Z2 switching.
Our initial experience is that this procedure works well. Figure 5.22 plots the response
of a 9 stage ECL ring oscillator when Mom expands groups through various numbers of
levels of floating capacitors. In each case the output of Mom using the Level-2 bipolar
model (which includes parasitic.floating capacitors and resistors) is compared with that of
SPICE using an identical piecewise linear model. Table 5.1 gives the switching delay error
for each case. The figure and table show that the largest change in accuracy is obtained by
capacitor levels
0
1
2 unlimited
% switching delay error 3.5 0.3 0.7
1.8
run time (seconds)
2.3
19.5
6.2 10.0
nodes per group
7.9 24.6 54.9
108
# waveform computations 869 27 10 4127
7960
Table 5.1: Decrease in Efficiency with Increasing Capacitor Levels.
expanding the group through just 1 level of floating capacitors.14
The table also shows the super-linear growth of execution time with increasing numbers
of levels. This super-linear growth can be explained. First note that the number of events
doesn’t change. Thus the execution time is roughly proportional to the number of waveform
computations and hence the average size of a group. Also note that because of the nodes and
i4Mom and SPICE piecewise linear simulations appear to be converging to slightly different waveforms.
The most obvious source of error is the waveform approximation. However, there are other possible sourozs
of error. For pr~tical reasons, it is not actually possible for Mom and SPICE to simulate exactly identical
piecewise linear models. Small parasitic conductances and capacitances must be added to SPICE’s piecewise
linear models in order to aid convergence of the numerical integration. Hysteresis (+/- ImV) must be added
to Mom’s piecewise linear models to enhance the stability of event scheduling. Finally, experimentation with
simple circuits that could be solved analytically indicate that SPICE may generate errors as large as 0.3%.
CHASPTER 5. MOMENT COMPUTATION
89
0.1
V
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
2
4
-
-
i-
-
Ii
-,j
I- - - Mom
.-.
lo 4)fj o
nS time
.2.
(a) 0 Levels.
0.1,
-0.6
.
;
-
_
_
-~
2-
_
6
Ib4.6d
-
-i-
-’ ;i
-
6-’
nS time
(c) 2 Levels.
8
-
-.
IO
nS time
(b) 1 Level.
_
_
4
-
-8
-
-
lb
nS time
(d) Unlimited Levels.
Figure 5.22: Limited Levels of Floating Capacitors.
CHAPTER 5. MOMENT COMPUTATION
90
coupling introduced by the bipolar parasitics the fine grain of the interconnection network
resembles a mesh. If the number of levels is modeled by the radius of a circle, and the
number of nodes in a group by the area of the circle then one would expect a roughly
quadratic initial growth of the group size with the number of levels.
Finally, note that for our small example a maximum speedup of only 8.5: 1 is achieved.
This seems consistent with the expectation that at any instant in time only one or two of the
nine stages of the ring is actively switching. However, larger circuits often exhibit much
more latency, and one might expect correspondingly greater speedups for them.
5.8 Summary
Moment computation can dominate the cost of waveform estimation. To compute moments,
switch-level simulators use RC pee analysis which is efficient because it takes advantage
of the tree-like topology of most circuits. We generalize RC tree analysis along two
dimensions. First, tree analysis is extended to apply when transistor models are generalized
from resistors to piecewise linear devices. This generalization retains the efficiency of
RC tree analysis for the transistor-capacitor trees found in MOS circuits and the current
steering trees found in ECL circuits. Second, tearing is used to handle non-tree topologies
and feedback. The advantage of combining a tree analysis with tearing is that most circuits
are trees and hence can be analyzed efficiently. The cost of analyzing more general non-tree
topologies is paid only when it is needed.
The addition of floating capacitors to device models can degrade simulation efficiency
because floating capacitors can potentially couple together all nodes in the circuit thereby
eliminating the ability of the simulator to take advantage of circuit latency. However, in
digital circuits there is rarely any significant coupling through multiple levels of floating
capacitors. Thus, repartitioning of the circuit can be achieved by ignoring coupling through
all but a limited number of levels of floating capacitors.
Chapter 6
Detecting Region Changes
The previous chapters showed how mOmems matching can be used to compute the response
of circuits containing piecewise linear devices. However, moments matching only deals
with linear circuits. That is, the response is computed assuming that all piecewise linear
devices remain in their present regions of linearity. In reality devices may change regions
of linearity in the middle of the transient. At the instant any device in the cluster changes
region, the simulator must stop and recompute the response with that device relinearized in
its new region.
This chapter describes the techniques used by Mom to determine if and when piecewise
linear devices change regions of linearity. It turns out that this task is considerably more
difficult for Mom than for Rsim. It is important to study this problem because, as we shall
see in a later chapter, this computation can dominate the run time of the simulation.
6.1 Mom vs Rsim
The incorporation of more general piecewise linear transistor models complicates the detection of when devices change regions of linearity. In Rsim’s case, a transistor switches
whenever its gate voltage crosses the switching threshold. Since the step response of RC
trees is approximated by a single exponential:
‘ib(t) = 5.0e4’
91
(6.1)
CHAPTER 6. DETECTING REGION CHANGES
92
(where 7 is given by the first moment), in order to find out whether a transistor switches
Rsim only needs to solve the following equation for t:
vg(t) - 2 . 5 = 0
(6.2)
where the first term on the left hand side is the waveform on the gate terminal and the
second term is the negative of the transistor’s switching threshold The solution for t can
be found explicitly:
Toe-“’ - 2 . 5 = 0
(6.3)
t = --7ln1/2.
(6.4)
Thus Rsim only needs to multiply the first moment by a constant in order to obtain the
switching time.
More general piecewise linear models introduce a few complications. They may have
several regions of linearity and it is necessary to check whether the model will change
from the present region to any adjacent region. As described in Chapter 3 the boundary
separating the present region from an adjacent region is a hyperplane defined by a linear
equation of the form:
a0 + up*(t) +
a2v2(t)
+
a3q(t)
= 0.
(6.5)
where v;(t) is the voltage on the ith terminal, and the a; are constant coefficients that
determine the location of the hyperplane. Since Mom approximates voltage waveforms
using sums of exponent&:
vi(t) = bjo + bile”lt + b;2eqzf + bi3eu3*
(6.6)
the time at which the transistor crosses that boundary can be obtained by substituting the
terminal voltages (Equation (6.6)) into the equation for the hyperplane (Equation (6.5)) and
finding the root of the resulting equation:
b + kleulf + k2e”zt + . . . + becg9’ = 0
(6.7)
The left hand side of Equation (6.7) defines a time varying waveform which will be referred
to as the boundary waveform.
CHAPTER 6. DETECTING REGION CHANGES
93
However, we are not interested in finding any root. Instead, Mom must start at the
current time and search forward to find the firsr root. Since the current time can always be
normalized to zero, Mom must find the smallest, positive root of an equation of the form
of (6.7). If no positive root is found then that boundary is never crossed. Mom must then
check all remaining hyperplanes bounding the current region.
Thus the task of rescheduling a device eventually reduces to (possibly several instances
of) the one dimensional root finding problem. The robust and efficient solution of that
problem is the focus of the rest of this chapter.
6.2
Overview of Root Finding
Although many general purpose root finding techniques have been described in the literature
(for example, Bisection, Newton Raphson, regulu fufsi, and Brent’s[Atk78]), there exists
no general technique which 1) offers any control over which root is found, 2) provides any
guarantee of convergence or 3) can detect when no root exists. Instead, in order to guarantee
convergence to a desired root it is necessary to incorporate problem specific knowledge.
Mom takes particular advantage of the fact that the left hand side of Equation (6.7) is a
weighted sum of exponentials. Different techniques are used depending upon the number
of exponent&.
The procedure used by Mom for finding roots involves 3 steps:
1. Moments matching is used to produce a low order approximation of the boundary
waveform. This is done because root finding is most difficult when there are a large
number of poles. Moments matching is used to reduce the number of poles to three
or fewer.
2. Once the number of poles has been limited to three, the problem can be broken down
into a number of special cases. For each different pole configuration the roots are
found by taking advantage of special characteristics of that particular configuration.
3. Although the waveform approximation in the step 1. makes it easier to find roots,
it can sometimes introduce small errors that lead to consistency problems with the
.
C7IiWlER 6. DETECTING REGION CHANGES
94
simulation. Therefore after the root is found, root polishing is used to eliminate the
errors introduced by the waveform approximation.
Each of these steps will be individually addressed in the following three sections.
6.3 Order Reduction
To simplify the task of finding roots, the boundary waveform (which could initially contain
as many as 9 poles) is approximated by a waveform having three or fewer poles. This is
done by forming the moments of the boundary waveform from the linear superposition of
the moments of the terminal voltages and computing a waveform approximation from the
resulting moments. The result is a reduced order waveform’
b(t) = +ko + klealt + k2edZt + k3e”3t
(6.8)
The poles may be either simple (both ki and 0, real) or may occur in complex conjugate
pairs (two poles with Qi and aj complex such that k; = k; and 0; = 0;).
6.4 Finding Roots
Once the number have poles has been reduced to three or fewer, Mom only needs to deal
with a limited number of configurations:
l
1 simple pole
l
2 simple poles
0 3 simple poles
0 1 complex conjugate pole pair
l
1 simple pole plus 1 complex conjugate pair.
‘All poles are assumed to be stable. Appendix E shows how to handle unstable poles.
CHAPTER 6. DE7EC77NG REGION CIiANGES
95
In general, the difficulty of finding roots is dependent upon the pole configuration. In
order to maximize efficiency a different technique is used for each different configuration.
Occasionally a root can be found explicitly, although most of the time Mom must use some
combination of root bracketing and Newton-Raphson iteration. The following subsections
describe each of the different techniques.
6.4.1 One Pole
The simplest case is if there is a single pole. In that case b(t) takes the form:
b(t) = b + kle”lt
(6.9)
and b(t) = 0 can be explicitly solved for t to yield:
t root
Ice
= -l-l”--G
(6.10)
Ql
Note that if ko/kl > 0 then no root exists. Also, if troot < 0 then the root occurred in the
past and is of no interest.
6.4.2 Two Poles
The roots of two pole responses are found by bracketing the root before using a general
purpose algorithm. Bracketing consists of locating an interval, t E [to, tl], that contains
exactly one root, the desired root. Once the root has been bracketed several general purpose
algorithms (for example Bisection) are guaranteed to converge to it.
To bracket the root, the stationary points of b(t) (that is the points at which the derivative is zero) are considered to partition the function into a number of disjoint segments
(Figure 6.1). Because the derivative is continuous, it cannot change sign without passing
through zero (a stationary point) and hence these segments must be monotonically nonincreasing or monotonically non-decreasing. Thus if the sign of the function changes from
one end of the segment to the other then that segment must contain exactly one rmt.
Fortunately, the stationary points of two pole responses may be obtained explicitly. If
the response consists of two simple poles (Figure 6.1 (a)):
b(t) = k. + kleul’ + k&‘zt,
(6.11)
CHAPTER 6. DE77XTZNG REGION CHANGES
(a) Simple Poles
96
(b) Complex Conjugate Pair
Figure 6.1: Stationary Points
then the stationary point (if it exists) can be found by setting the derivative of b(t) to zero:
t stationary = -
1
Ql - f-72
In - h 01( km 1
(6.12)
Since there is at most one stationary point the waveform may consist of at most two
monotonic segments. A comparison of the signs of b(t stationary), b(t = e-00), ad b(t =
+oo) reveals whether or not either segment contains a root.
If the response consists of a complex conjugate pole pair, then b( t ) can be expressed:
b(t) = ko + Men’ cos (wt + 4),
(6.13)
which has the stationary points:
1
t stationary = W
tan-+Q+ak)
(6.14)
fork = . . . - l,O,l,.... Although there are an infinite number of stationary points, it is
only necessary to examine the first couple of segments past the origin because the waveform
decays with time (Figure 6.1 (b)).
6.4.3 Three Poles
Two cases can occur for three pole waveforms. Either all three poles are simple poles, or
there is one simple pole and a complex conjugate pair.
CHAPTER 6. DElEC77NG REGION CHANGES
97
Three Simple Poles
The case of three simple poles
b(t) = b + klewQ1’ + k2emu2’ + k3e-u3t
(6.15)
(6; E R, ui > 0) is handled by conceptually transforming the multipole exponential
expression into a polynomial. This allows us to use results from the area of polynomial
rootfinding.
Assuming the u; are rational, they can be expressed as the ratio of integers 0; = ni/rlc,where Tlcd is their least common denominator. Then the transformation of variables:
y = e -t/rlcd
(6.16)
p(y) = ko + k,y”’ + kzy% + k3yn3.
(6.17)
results in a polynomial in y:
Because y varies from 1 + 0 as t varies from 0 + +oo, we are interested in the largest
rootofp(y)intheregionyE [O,l].
However, it is not practical to simply track down all the roots because the powers to
which y is raised (and hence the number of roots) may be extremely large. Instead, theorems
that predict the number of roots that lie along segments of the real axis allow us to narrow in
on the single real root of interest. In particular we employ Decartes’rule of signs[Hou70]:
Definition 1 Given a sequence of real numbers a~, al, . . . Un a variation in sign occurs if
UiUi+l < 0 or if U;Ui+j < 0 Und U;+l = Ui+2 = . . . = Ui+j-1 = 0.
Theorem 1 (Decartes Rule of Signs) Let p( x) be a polynomial p( z) = ~0 + ~15 + a2x2 +
. . . + a,~“. If the sequence of its coefficients has V variations in sign and the number of
roots on the positive real axis is r then V - r is a nonnegative even integer.
Note that this theorem only utilizes the signs of the coefficients of the polynomial and
doesn’t actually require the computation of the exponents. After sorting the poles in order
of increasing frequency magnitudes (left to right, starting from the DC “pole”, ko) it is only
necessary to scan the signs of the coefficients.
CHAP’ZER 6. DETECTING REGION CHANGES
98
Often, Decartes’ rule of signs provides sufficient information to bracket the root. The
simplest example is if there is no variation in sign. In that case we quickly conclude there
is no root. For a more complex example, consider the case when the sequence of signs is
- - ++. Since there is one variation in sign, there must be exactly one root. Additionally,
because the coefficient of the DC pole is negative the final value is negative. Thus if
b( t = 0) > 0 then the interval t E [0, +oo] brackets the root. Otherwise t E [-w,O]
brackets the root and the root is of no interest. There are 16 possible combinations of the 4
signs. In 13 of those 16 cases the root can be bracketed by merely inspecting the signs of
the coefficients.
When the signs of the coefficients alone don’t provide enough information, a generalization of the technique used for two-pole waveforms is employed. That is, the stationary
points are used to partition the function into disjoint monotonic segments. The problem of
finding stationary points of b(t) is equivalent to seeking the roots of its derivative, b’(t):
6’(t) = -klulesal’ - k2u2eWb2’ - k3qe-+’
=
e
-U1t(-klul - kp2e -(“2-u1)t _ k3u3e-h-~d’)
= e -““w(t)
(6.18)
(6.19)
(6.20)
where w(t) is given by:
w(t) = - klul - k2u2e -(Q2-a1 )t
_ k3u3e-(V’
df
(6.21)
Because e-O1 t # 0, w(t) has the same roots as b’(t) but has one fewer poles. Thus, the two
pole techniques described above can be used to find the roots of u,(t). Since these roots are
also the stationary points of b(t) they can be used to bracket the smallest positive root, after
which a general purpose rootfinding algorithm can be used to converge to it. Note that this
technique can be applied recursively to waveforms consisting of larger numbers of simple
poles.
Simple Pole Plus Complex Conjugate Pair
The most difficult case is when the waveform consists of a simple pole plus a complex
conjugate pair.
b(t) = ko + kleU1’ + Meu2’cos (wt + q5),
(6.22)
CHAPTER 6. DETECTING REGION CHANGES
99
In this case the smallest positive root is bracketed by constructing a piecewise quadratic
approximation to the response that is guaranteed not to deviate from the true waveform
by more than an error voltage. The reasoning is that a change of region that erroneously
occurs because of a small voltage perturbation (- 1 mV) will probably be short lived and
have little effect on the global behavior of the circuit.
The piecewise quadratic segments are constructed one at a time starting from t = 0
(Figure 6.2). After each segment is constructed, its end points and extremurn are checked
Figure 6.2: Rootfinding using Piecewise Quadratic Segments
to see if the segment crosses zero. If so, the desired root can be bracketed. When a segment
is constructed an attempt is made to maximize its length (that is the time between its end
points) subject to the constraint of keeping the maximum deviation bounded. Thus, as the
waveform decays with time the lengths of successive segments tend to increase. Finally,
the maximum possible negative contribution of each of the terms in Equation (6.22) is
monitored so that the search can be terminated when it is clear that the remaining waveform
cannot cross zero.
This technique is more general than the preceding techniques in that it can be applied
to any combination of simple and complex conjugate poles. In fact, the technique was
initially used to handle all waveforms. Unfortunately, experience with the technique
revealed efficiency problems. Sometimes time must be advanced very far into the future
and large numbers of segments must be constructed in order to conclude that there is
CHAPTER 6. DE77X77NG REGION CHANGES
100
no root. Unfortunately, this work often turns out to be unnecessary because some other
device switches very early on. Because switching events for different devices are sought
independently we are performing what amounts to a depth first search for the segment
bracketing the first switching event. Unfortunately, this particular root finding technique
would benefit from a breadth@ search organization.
Another problem is that large numbers of segments are needed to approximate waveforms with large voltage swings. In Chapter 4 it was observed that occasionally waveform
approximations have voltage swings that far exceed what is physically possible. In those
cases only an infinitesimal initial portion of the waveform approximation is used before
some device switches. Unfortunately, the root finding algorithm doesn’t take this into
account and searches the entire waveform for roots. Because the waveform is so large, this
entails searching through an excessive number of segments.
6.5 Root Polishing
The previous section showed how to find the roots of waveforms consisting of three or fewer
poles. However, remember that these low order waveforms are only approximations of the
original boundary waveform in Equation 6.7. In fact, the roots produced by this procedure
may differ slightly from the roots of the actual boundary waveform. Unfortunately even
slight differences can lead to inconsistencies in the event driven simulation algorithm. For
example, suppose that because of an approximation error a diode is scheduled to turn on
slightly prematurely. When that event fires the voltage across the diode will be insufficient
to turn it on. If the simulator fails to check this and turns the diode on anyway it will
recompute the response of the cluster only to find that the diode’s next event will be to
turn off immediately! To avoid this unnecessary work the simulator must check that a
device’s terminal voltages are consistent with the event. If an inconsistency is detected the
event should be discarded and the device rescheduled. We refer to the discarded events as
spurious events. In this section we will show that roorpofishing and hysteresis can be used
to eliminate spurious events.
By checking each event for consistency the simulator eliminates the biggest expense
associated with spurious events, that is the needless recomputation of a group’s response.
CHAPTER 6. DE7’ECTING REGION CHANGES
101
However, it doesn’t avoid the cost of rescheduling the device. Therefore, a number of steps
were taken to to reduce the probability of spurious events. First +/ - 1mV of hysteresis
was introduced into every switching decision. For example, if a diode is off then it isn’t
turned on until l&de > A01 volts (assuming a nominal threshold of A300 volts) and if it
is on it isn’t turned off until Vdio& < .799 volts. Surprisingly, this was not sufficient to
completely eliminate spurious events. In fact, for some simulations as many as 50% of the
events that fired had to be discarded and rescheduled. When statistics about the voltages
of the actual boundary waveform at the estimated root times were collected (Table 6.1) it
24%
1OOuV < Verr < 1mV
1mV < Verr < 1OmV
58%
1OmV c Verr < 1OOmV 9%
1OOmV < Verr < 1V
9%
Table 6.1: Voltage Error at Estimated Root of Boundary Waveform.
became apparent that errors in the switching time often led to scheduling devices to turn
on when their terminal voltages were more than 1mV short of the switching thresholds.
In fact in 9% of the cases the voltages were more than 1OOmV short of the switching
thresholds. Similarly, statistics about the difference between the estimated switching time
and the actual switching time (Table 6.2) revealed that some events were being scheduled
,di
Table 6.2: Time Error of Boundary Waveform Root Estimate.
as much as 1OOps too early.
To eliminate the errors in the switching time estimate, rootpolishing is employed. That
is the root of the reduced order waveform serves as the starting point for a few additional
Newton-Raphson steps using the real boundary waveform. Table 6.3 shows that typically
only a couple of iterations are needed to polish a root. It was found that root polishing
completely eliminated spurious events.
CHAPTER 6. DETECTING REGION CHANGES
102
number of
fraction of
iterations total number
of roots
0
70.7%
1
23.2%
2
5.9%
6
0.1%
Table 6.3: Number of Iterations for Root Polishing.
6.6 Measurements
From the description of the algorithms it is apparent that the amount of time needed to find
roots increases with the number of poles in the waveform. Table 6.4 shows the average
Pole Configuration Simpl Simp2 Conj2 Simp3 Conj3
cost
54
490
350
790 1900
Table 6.4: Average Number of Cycles for Root Finding.
number of machine cycles2 spent finding roots for each pole configuration. “Simpl” is one
simple pole, “Simp2” two simple poles, “Conj2” a complex conjugate pole pair, “Simp3”
three simple poles, and “Conj3” a complex conjugate pole pair combined with a simple
pole. The data indicate that execution time grows somewhat faster than linearly with the
number of poles, although the complex conjugate / simple pole combination stands out as
being particularly inefficient. Thus the cost of rescheduling devices can be expected to rise
as the complexity of waveforms increases.
The simulator was instrumented to print out the distribution of pole configurations
for a number of circuits (Table 6.5). (For descriptions of the circuits see Table 4.2.)
The table reflects the previously noted increase in the number of poles for circuits and
models of increasing complexity. When there are two poles they are most likely to be
simple poles whereas when there are three poles they are most likely to include a complex
conjugate pair. Unfortunately, for the circuits employing floating capacitors, the least
2These numbers exclude the cost of boundary waveform approximation and root polishing. Estimates
of machine cycles are for the MIPS R2000 CPU and were estimated using the pixie execution profiling tool
created by MIPS Computer Systems, Incorporated.
CHAP7’ER 6. DETECTING REGION CHANGES
r
103
%Simpl %Simp2 %Conj2 %Simp3 %Conj3
Pole Configuration
0
0
0
0
100.0
CMOS0 Inverter Ring
0
0
0
98.1
1.9
CMOS0 NAND Ring
0
1.2
1.2
21.4
76.2
CMOS 1 Inverter Ring
0
0.6
0.1
18.6
80.7
CMOS1 NAND Ring
20.2
13.1
17.0
25.0
24.6
BJl’Q ECL Ring 10%
0
25.0
0
37.5
37.5
BIT0 ECL Ring 20%
46.4
8.2
22.0
12.2
11.1
BIT1 ECL Ring
37.8
35.0
0
0
6.1
BIT2 ECL Ring capLevel&
57.8
2.7
21.1
2.6
15.8
BIT2 ECL Ring capLevels=l
62.2
1.6
16.8
3.8
15.3
BJT2 ECL Ring capLevels=
51.1
7.3
24.8
5.6
11.2
BIT2 ECL Ring capLevels=oo
Table 6.5: Pole Configurations for Root Finding.
efficient configuration, conj3, is the most common.
6.7 Summary
The incorporation of more general piecewise linear devices complicates the detection of
when devices change regions because their regions may be bounded by multiple hyperplanes, those hyperplanes may depend on multiple terminal voltages, and the terminal
voltages may have multiple poles. Consequently, the task of rescheduling devices is significantly more expensive for Mom than for Rsim.
The time of the soonest region change of a device is found by computing the smallest,
posirive root of all of its boundary wvefoms. However, no general purpose root finding
technique exists which provides any guarantees about convergence for all problems. Therefore it is necessary to take advantage of special characteristics of the sums of exponentials.
The procedure involves three steps. First, the problem is simplified by finding an approximation of the boundary waveform that has three or fewer poles. Then, depending upon the
particular pole configuration, the root of that approximation is found either explicitly or by
using some combination of root bracketing and Newton-Raphson iteration. Finally, small
errors introduced by the waveform approximation are eliminated using roof polishing.
The algorithms for detecting region changes have been instrumented. Not surprisingly,
CHAPTER 6. DETECTING REGION CHANGES
104
measurements made on a number of benchmark circuits indicate that the complexity of
boundary waveforms also grows with increased model complexity and decreased error
tolerances. In turn the cost of finding roots increases with the complexity of the boundary
waveforms. In fact, it can take 40 times as much CPU time to find the roots of a third order
boundary waveform as a first order boundary waveform. In Chapter 7 we shall see that
one consequence of this is that the task of rescheduling &vices can dominate the overall
execution time of the simulation.
Chapter 7
Evaluation
The objective behind producing Mom was to create a simulator that extended the accuracy
and flexibility of existing MOS and bipolar switch level simulators while simultaneously
preserving much of their efficiency for the simple cases. This chapter evaluates the extent
to which we have achieved those goals. It begins by examining the application of Mom to a
number of CMOS, ECL and BiCMOS circuits that appear to be just beyond the capabilities
of existing switch-level simulators. Those simulations show that the additional flexibility
is obtained with significant speedups over the circuit simulator SPICE-3d2. Then the
performance of Mom is compared with that of the existing switch-level simulators, Irsim
and Bisim,’ on circuits that can be simulated at the switch-level. The benchmarks show
that when the simplest switch-level models are used Mom is able to achieve switch-level
accuracies with only a moderate degradation in performance compared to dedicated switchlevel simulators. However, the benchmarks also reveal that Mom’s efficiency degrades
precipitously with increasing model complexity. Further measurements reveal that the
causes of this degradation appear to be fundamental to this approach to simulation. It
appears that this approach loses its speed advantage for those cases where the full accuracy
and flexibility of circuit simulation are desired.
‘SPICE-3d2 is a derivative of the circuit simulator SPICE[Nag75], frsim[SH89] is a derivative of the
MOS switch level simulator Rsimm3], and Bisim is an ECL switch-level simulator[KAHS88]. Although
Irsim is an inctemenful simulator, its incremental capabilities aren’t used here and don’t adversely affect the
efficiency of normal simulation.
105
CHAP7’ER7. EVALUATION
106
7.1 Extending Switch-Level Simulation
In this section the use of Mom is demonstrated on a number of small MOS, ECL, and
BiCMOS circuits that can’t be simulated by Irsim or Bisim. It is shown that the more general
transistor models and circuit analysis techniques allow Mom to handle these “difficult”
circuits. Even for these circuits a large fraction of transistors can be modeled using switchlevel models. Therefore significant speedups over SPICE can be obtained.
A word of caution is probably in or&r regarding the choice of benchmarks. SPICE
does not take advantage of circuit latency whereas Mom, Irsim, Bisim, and most of the
timing simulators do. Therefore in a comparison it is possible to make the latter simulators
seem arbitrarily good compared to SPICE by simply using benchmark circuits with large
amounts of latency. Such benchmarks are not unrealistic because large circuits tend to have
large amounts of latency.
However, Mom’s ability to take advantage of latency is not the primary issue of interest
here. Techniques for partitioning circuits and avoiding the analysis of latent subcircuits
have been applied to many circuit, timing, switch, and gate-level simulators. Instead the
questions of particular interest here are:
1. How does the use of approximate piecewise linear transistor models and moment
analysis compare with the use of accurate nonlinear models and numerical integration?
2. How much does the ability to handle more general piecewise linear models impair the
efficiency of Mom compared to the efficiency of a dedicated switch level simulator?
Because it is in common use we would like to use SPICE as an example of a simulator
employing numerical integration. Therefore in this chapter the issue of latency has been
neutralized by considering only small circuits with little latency.
7.1.1
CMOS
To achieve efficient simulation Irsim takes advantage of characteristics shared by most
MOS digital circuits, However, circuits are occasionally encountered that violate some
107
CHAPTER 7. EVALUATION
of its assumptions. The static RAM sense amplifier previously described in Chapter 3
(repeated here for convenience in Figure 7.1 (a)) is an example of such a circuit. Because
bit
B
a
---I
tail
I-
-l-
lo
(a) Circuit
-lb-
ib -
30 -40 5b - f56- 5,) &I
(b) Response: SPICE vs Mom
Figure 7.1: Sense Amplifier for Static RAM.
the circuit has outputs that do not swing from rail to rail, has inputs whose thresholds are not
halfway between the power rails, has NMOS and PMOS transistors that are simultaneously
on and pull the output in opposite directions, and has a device whose input is connected to
one of the circuit’s outputs (feedback) it can be difficult for Irsim to simulate. In contrast,
Mom’s more accurate transistor models and its ability to handle more diverse topologies
(including feedback) enable it to do a better job of matching SPICE (Irsim simply reports
that the state of all nodes are undefined). Figure 7.1 (b) compares the response of SPICE
using nonlinear models with the response of Mom using Level-l models. The parameters
for the Level-l models were chosen by linearizing the SPICE transistors when the circuit
is biased at its switching threshold. The fit is remarkable considering the simplicity of the
transistor model.
However, the additional accuracy comes at the expense of reduced simulation efficiency.
Table 7.1 compares the execution times of SPICE and Mom for a number of circuits to be
considered in this section. From this table we see that for this example Mom is 61 times
faster than SPICE. In contrast (as we shall see in a following section) Irsim is usually at
least 3 orders of magnitude faster than SPICE.
CHAPTER 7. EVALUATION
SPICE Mom
SRAM Sense Amp
1.9 .031
DRAM Cell
18.8 .075
ECL RAM Cell
4.2 .102
Diode Decoder
8.2 .102
BiCMOS Buffer
7.5 .054
BiNMOS Buffer
5.1 Xl08
BiCMOS RAM Cell, Read
2.3 .OOS
BiCMOS RAM Cell, Write
4.6 .012
108
61
250
41
81
140
650
250
390 1
Table 7.1: Execution Time of Example Circuits (seconds).
The dynamic RAM cell and sense arnplifier[SSD72] in Figure 7.2 is, perhaps, a more
dramatic demonstration of the capabilities of Mom. Although Irsim can be coerced to yield
a logically correct simulation of the SRAM sense amplifier if the resistances and thresholds
of the transistors are specially selected, the DRAM cannot be similarly accommodated.
The DRAM stores data as charge on capacitors in the memory cells. A read cycle
begins with both bit lines, bit and bit precharged to k& and s precharged to (I&-J - 6,).
When word&e is raised, charge sharing takes place between the bit lines and the selected
cells with the result that the two bit lines are charged to slightly different voltages. Then
the sense amplifier is turned on to magnify this voltage difference. When T3 turns on, s
begins to fall which will cause either TZ or 72 to turn on, depending upon which bit line is
higher. For example, if bit is higher then T2 will turn on, bit will be pulled to ground and
bit will be pulled to Vdd.
However, note that TZ and 7’2 are turned on by pulling the source terminals low.
Because the switched resistor model can only be turned on by pulling the gate terminal
high it is incapable of modeling the behavior of those two transistors. However, if those
transistors are simulated using Mom’s MOS Level-l model, the correct circuit behavior
can be obtained. Figure 7.3 shows plots of the bit line waveforms generated by SPICE
and Mom for a read cycle followed by a precharge. For Mom’s simulation the Level-l
model was only used for TI, n, and T3. Everywhere else Level-O models were employed.
Although Mom’s response differs from SPICE’s, it is probably adequate for a first order
verification of the entire DRAM.
Table 7.1 shows that for this example Mom is 250 times faster than SPICE. The
CHAPTER 7 . E V A L U A T I O N
word&e
I
Figure 7.2: Dynamic RAM Cell and Sense Amplifier
41
31
2
1 I
0,
-l 0
Figure 7.3: DRAM Bit Lines: Read followed by Precharge
109
CHAPTER 7. EVALUATION
110
improvement in efficiency is probably due to the extensive use of the simple switched
resistor model.
7.1.2
ECL
The ECL switch-level simulator, Bisim, is based upon tracing paths through current steering
networks formed by bipolar transistors (Figure 7.4). Negative current can be thought of as
Figure 7.4: ECL Current Steering Networks.
originating from the current source at the bottom of the network and rising towards the top.
When the current encounters a node with multiple emitters attached, it is steered through
the transistor with the highest base. If the current encounters a resistor then it has reached
an output and the resulting voltage drop causes the output to fall. Thus a simple path tracing
algorithm is sufficient to determine the behavior of textbook ECL logic gates.
However, real IC’s often contain a greater variety of circuit forms. For example in
or&r to lower an output’s logic high level a resistor Tee (Figure 7.5) is sometimes used.
Thus, the simulator must be prepared to deal with resistor networks in place of the load
resistor. Heuristics can be used to handle simple networks, but the diode decoder described
in Chapter 5 (Figure 7.6) is an extreme example of a network that can’t be handled by Bisim.
Not only does the network contain nonlinear devices, but it also contains loops. However,
111
CHAPTER 7. EVALUATION
Figure 7.5: Resistor Tee as ECL Load.
Figure 7.6: Diode Decoder.
CHAP7ER 7. EVALUATION
112
because Mom can handle arbitrary networks of piecewise linear devices it can successfully
simulate the circuit. Figure 7.7 depicts responses at three diode decoder outputs when one
0.1
V
0.0
r
-
-o*6;
-
-
-
-’
- SPICE -
-
1
_
-
-
_
-
-
_
2
nS time
,
Figure 7.7: Diode Decoder Response.
of the address lines changes. The figure illustrates a reasonable match between Mom and
SPICE. For this example Mom utilizes Level-O bipolar models and is 81 times faster than
SPICE.
Another problem arises when current is shared between multiple transistors. In the
current steering algorithm described above it was assumed that one transistor’s base was
sufficiently higher than all the others such that all the current went into a single transistor.
However, sometimes several bases are so close that the current is divided between them. In
fact, degeneration resistors are sometimes deliberately inserted in series with the emitters
(Figure 7.8) in order to achieve an even division of current. For many simple cases the
Figure 7.8: Emitter Degeneration Resistors to Cause Current Sharing.
CHAPTER 7. EVALUATION
113
current steering algorithm can be modified to split the current correctly. However the
Schottky clamped ECL RAM[KSM+78] shown in Figure 7.9 has proven to be beyond the
b
0
:
0
:
0
:
l .e
R
0.0
(
T5
6mA
I
T6
6mA
IMI
> (
6mA
(
I
>
Figure 7.9: Schottky Clamped ECL RAM.
capabilities of Bisim. Each cell stores data using a cross coupled pair TI and T2 which
requires a small sfundby current. In order to avoid the overhead of a separate current source
for each cell, the standby currents are obtained by dividing the current from a single current
source, Ibwl shared by all cells in a row. Bisim’s problem is that in order to divide the
current entering the bottom word line, bwf, it needs to know the voltages of the cell nodes
d, db, d2, db2,. . . . In turn, in order to determine the voltages of the cell nodes, Bisim needs
C’HWZER 7. EVALUATION
114
to know how to divide the current.
For our example the standby current is 50pA. Thus depending upon whether a 1 or a 0
is stored, either d or db will be 300mV below the the top word line, fwf. A cell is read by
raising the top word line and supplying current I61 and Ibfb to the bit lines bf and bfb. Then
the emitter follower attached to the highest cell node will turn on raising the corresponding
bit line. For example, if d is high then T4 will turn on pulhng up bfb. The Schottky diode
02 prevents the bit line current from driving T4 into saturation. The bases of TS and T6 are
biased halfway between the high and low logic levels of the cell and are used to sense the
state of the bit lines. To write the cell the top word line is raised and current is supplied to
only one bit line. For example if fwl is raised and bl gets current then that current will flow
through T3 thereby setting d to a logic level 0.
A RAM fragment consisting of one row of two cells was constructed and simulated using
SPICE and Mom. As a compromise between modeling the effects of floating capacitance
between tightly coupled nodes, and allowing the independent analysis of loosely coupled
nodes, the Mom simulation utilized Level-l bipolar models for the cross coupled cell
transistors (TI and T2) but Level-O models everywhere else. Figure 7.10 (a) shows the data
0
- SPICE
-
-
-
M
o
m
v
-1
/
-2
4b
n
t
a
5
a
-1,- _a 2,- ’ ej.-4.3-.-j-2
/
I
/
db
-1
,_
2
._
-j--4.3-‘
-i
n$time
(a) Write Operation
nQ time
(b) Read Operation
Figure 7.10: ECL RAM Cell Internal Nodes.
nodes of a cell during a write operation. Note that for a brief instant of time (t w 1.5ns)
both members of the cross coupled pair are on and the response includes an unstable pole.
CHAPTER 7. EVALUATION
115
During a read operation (Figure 7.10 (b)) one of the limitations of our bipolar transistor
models becomes evident. Because Mom neglects the base current, it fails to take into
consideration the degradation of the high level caused by the base current of the emitter
follower pulling up the bit line. For example, if d is high then (Iblb/P) - 80pA will flow
into node d thereby degrading its logic high level by about 5OmV. Although the cell nodes’
errors appear to be moderate, the actual bit line swing is only about 1OOmV and hence Mom
significantly over estimates the bit line swing (Figure 7.11 (a)) and under estimates the
o*J.e
-
-
-,-
-
- SPICE
V
(-Jo,--Mom
I
i
-
-
’
-
_
_.
’
(
-0.5 1
-**go
1.
i-
j
-4
-
(a) Bit Lines
5
-
,
-06
..oL--lrz/
1
2
-j--4-j---j
n$time
nQtime
(b) Sense Amp Output
Figure 7.11: ECL RAM Read.
read delay (Figure 7.11 (b)).
However, note that the ECL RAM is not typical of ECL circuits. The desire to minimize
both the power consumption and access time of the RAM leads to large ratios of bit line
to standby current. In addition the need to keep the cell size small leads to high current
densities in the access transistors which in turn causes some p rolloff at the high bit line
current levels.
7.1.3
BiCMOS
An increasing number of circuit designs utilize both bipolar and MOS transistors on the
same chip. Unfortunately neither Irsim nor Bisim can handle these new BiCMOS designs.
CHAPTER 7. EVALUATION
116
b
0
-1;
(a) Circuit
-
-
-
-
.i-
-
.
-
_
nS time
4
(b) Response: SPICE vs Mom
Figure 7.12: BiCMOS Buffer.
Figure 7.12 (a) shows one of several variations of the BiCMOS buffer[GM88]. In general,
BiCMOS buffers attempt to take advantage of the high current driving capabilities of bipolar
transistors to charge and discharge large capacitive loads. In this circuit TZ is used to pull
the output up to within a diode drop of the upper power supply rail while T2 is used to pull
the output down to within a diode drop of the lower power supply. Hard saturation of T2
is avoided by arranging for its base drive (via T3) to disappear once the output has reached
its low level. This circuit was simulated by Mom using Level-O models for all MOS and
bipolar transistors. However unlike for ECL, this circuit has no current sources to provide
convenient hints about the operating regions of the bipolar transistors. Therefore, the
bipolar operating regions were estimated by examining the slopes of the waveforms from
a circuit simulation2 The results are compared with SPICE in Figure 7.12 (b). The falling
output transition predicted by Mom is too rapid and incorrectly drops below the power rail
because Mom neglects base current and 7’2 actually saturates momenturily. Mom’s rising
output transition takes the form of a simple exponential rather than a ramp because node
6 is driven by switched resistors and consequently has a single time constant waveform
estimate. However, the loss of accuracy comes in exchange for a speedup of 140 over
*Further experiments reveal that the exact choice of region of operation has little affect on the parameters
of our model because the high current levels cause the transconductance of the exponential device to be
swamped by the parasitic emitter resistance.
CHAPTER 7. EVALUATION
117
SPICE.
Another variant of the BiCMOS buffer[WR83] is shown in Figure 7.13 (a).
This
5
V
4
in
3
out
2
1
0
-‘A
(a) Circuit
-
.
_
_
-i-
-
.
4
nS time
(b) Response: SPICE vs Mom
Figure 7.13: BiNMOS Buffer.
buffer, sometimes referred to as the BiNMOS buffer, replaces the bipolar pulldown with an
NMOS in order to obtain a larger output swing. The circuit was simulated by Mom using
Level-O models everywhere. The responses predicted by SPICE and Mom are compared
in Figure 7.13 (b). Because the falling output transition is now determined by a switched
resistor model, it too is a simple exponential. Additionally the more extensive use of
switched resistor models results in an improved speedup over SPICE of 650: 1.
The final example in this section is a BiCMOS RAM cell (Figure 7.14) that combines
techniques from CMOS and ECL RAM design[YHWSS]. In common with CMOS static
RAMS, the memory cell consists of a pair of cross coupled CMOS inverters (TZ-T4).
However, instead of being connected to the top power supply rail, the sources of TZ, and
72 are connected to a read word line driven by an emitter follower, TS. In common with
ECL RAMS, the cell is read by raising the read word line and sensing the change in a cell’s
logic high level via the emitter follower T6. For example, if cell node db is high then T6
will turn on and pull up the read bit line. The cell is written by placing the write data
on the write bit line, and raising the write word line. The write access transistor, 77 is
sized so that it overpowers TZ and T3. This circuit was simulated by Mom using Level-O
118
CHAPTER 7. EVALUATION
T
Read Data Chn
0
:
:
l
T5
Reid Word Line
L
%-
Write Word tine
c
,
1>
T
Figure 7.14: BiCMOS RAM.
l *o
CHAPTER 7. EVALUATION
119
transistors everywhere. The results for both read and write operations were compared with
SPICE. Figure 7.15 (a) shows the data output for a read operation. The earliest rising
0.0
V
-0.1
-0.2
-0.3
,-0.4
-0.5
-0.6
1
_
2-
-j.
-
4-
-5-j
nS time
(a) Data Output During Read
1
2
-3.
-
-4.
- 5
nS time
(b) Cell Data Nodes During Write
Figure 7.15: BiCMOS RAM Response.
transition is the input to the read word line driver (TS) which initiates the read operation.
Figure 7.15 (b) shows the cell nodes during a write operation. The write bit line is zero so d
is written to zero. In the plot the earliest rising pair of waveforms are of the write word line
transitions which initiate the write. The falling pair of waveforms are d and the later rising
pair of waveforms are db. Note that there is a significant mismatch between the SPICE and
Mom waveforms for db. One factor that contributes to this mismatch is the inappropriate
parameterization of T2. The PMOS Level-O model was characterized for static CMOS
logic gates which presume that IV,, - V&,1 N 4 volts. However during a write operation the
read word line sits 1.3 volts below the top power supply rail and hence 1 V,, - V,,l N 2.7
volts. Thus for this circuit Mom overestimates the current drive of T2 by at least a factor
of 1 S. On the other hand, for the read and write operations, Mom’s speedup over SPICE is
250 and 390, respectively.
CHAPTER 7. EVALUATION
7.2
120
Performance Compared to Switch-Level Simulation.
The previous section illustrated the additional flexibility obtained by incorporating piecewise linear models into the switch-level framework. However, increased generality usually
comes at the expense of decreased efficiency. To investigate this issue a number of circuits
were simulated by Mom, Irsim, and Bisim. Examinations of the execution times, execution
profiles and statistics of the simulations lead to a better understanding of the overhead.
The selection of benchmark circuits for comparing circuit simulators and switch-level
simulators can be tricky. On one hand we would like the benchmark to be long enough
so that execution times of the switch-level simulators are large enough to swamp overhead
functions not of interest (reading the wirelist, command parsing, etc.). On the other hand,
we would like the benchmark to be short enough so that the execution times of the circuit
simulators can complete in a reasonable time. However, these goals can conflict. Even for
small circuits with little latency, Rsim can easily achieve speedups of 4000 over SPICE.
Therefore a benchmark that takes Rsim 1 minute will take SPICE 2.7 days.
Ring oscillators were selected for benchmarking because it is easy to adjust the simulation interval for each simulator to obtain reasonable execution times. Each ring oscillator
was simulated by three simulators: SPICE-3d2, Mom, and (as possible) either Bisim or
Irsim. For each simulator the simulation interval was adjusted to guarantee that at least 60
seconds of CPU time were used and that at least 10 periods were simulated. Since faster
simulators end up simulating more periods the execution times are reported in terms of the
amount of CPU time used to simulate a single logical transition of a logic gate in the ring.
For example, if a simulator takes 1 second to simulate 10 periods of a 5 stage ring oscillator
its performance is reported as [lsec/( 10 * 5 * 2)] = .OlO seconds per logic gate transition
(in 1 period each of the 5 gates in the ring switches twice).
Table 7.2 reports the results. The CMOS and ECL ring oscillators used in the benchmark
were described in Chapter 4. In addition ring oscillators using the two BiCMOS buffers
described in the preceding section were included (although they obviously couldn’t be
simulated by Irsim or Bisim). The left most three columns report the amount of CPU
time consumed by each of the simulators in units of milliseconds per logic gate transition
(msec/LGT). The two middle columns compute the speedup of Mom over SPICE-3d2 and
CHAP7ER 7. EVALUA77ON
1
CMOS0 Inverter Ring
CMOS0 NAND Ring
CMOS 1 Inverter Ring
CMOS1 NAND Ring
BJTO ECL Ring 10%
BJ’TQ ECL Ring 205k
BJTl ECL Ring
BJT2 ECL Ring CapLev=l
BiCMOS Buffer Ring
BiNMOS Buffer Ring
121
CPU time (msec/LGT)
SPICE (Bi/Ir)sim Mom
932
.22 0.59
1850
.54 0.79
932
.22 11.70
1850
.54 22.70
824
1.08 38.20
824
1.08 3.53
824
1.08 33.70
824
1.08 76.00
- 30.98
5800
3260
2.25
Ratio CPU times
2300
80
81
22
230
24
11
187
1400
1.5
53.2
42.0
35.4
3.3
31.2
70.4
-
1 Period
Error
0.4
28.0
2.6
4.7
10.2
16.5
5.7
1.2
9.9
10.5
Table 7.2: Simulator Performance on Ring Oscillators.
the degradation of Mom relative to the switch-level simulators. The last column reports the
percentage error in Mom’s prediction of the period of oscillation relative to SPICE.
One thing to note from this table is that Mom’s performance relative to SPICE appears
to be slightly better than indicated by Table 7.1. While Mom was 140 and 650 times faster
than SPICE for the BiCMOS and BiNMOS buffers, it is 187 and 1400 times faster for ring
oscillators formed from those buffers. This is probably because the simulations are long
enough to amortize the cost of various overhead functions.
Also evident is the efficiency of switch-level simulation. Irsim and Bisim are between
4200 and 760 times faster than SPICE. In addition, Mom’s increased generality extracts
only a moderate performance penalty when switch-level models are employed. For circuits
using the MOS switched resistor model (“CMOS0 Inverter Ring” and “CMOS0 NAND
Ring”) Mom is between 2.7 and 1.5 times slower than Irsim. For the circuit using the
bipolar switched resistor model (“BJW ECL Ring 20%” and “BJTO ECL Ring 10%“) Mom
is between 3.3 and 35 times slower than Bisim (although as discussed later, the last case
is an anomaly). Note that for these models the accuracy of Mom is comparable to that of
switch-level simulation.
However, as more accurate models are used Mom’s efficiency decreases rapidly. When
MOS Level- 1 models are used in the CMOS ring (“CMOS 1 Inverter Ring”) Mom slows
down by an additional factor of 20. When bipolar Level-2 models are used in the ECL ring
CHAPZER 7. EVALUATION
122
(“BIT2 ECL Ring CapLevel=l”) Mom slows down by an additional factor of 2. In return,
Mom achieves increased accuracy. For this pair of circuits the period estimated by Mom
is off by only 2.6% and 1.2% relative to SPICE. Such low errors are generally beyond the
capabilities of Irsim and Bisim.
These data raise some interesting questions. Before constructing Mom we did not
anticipate the extent to which more complex models would slow down the simulation. For
example, the MOS Level- 1 model is just the MOS Level-O model with one additional region
of linearity. It seemed surprising that one additional region would slow down the simulation
by a factor of 20 or 30. In order to find out what was going on additional statistics were
gathered from the simulations. Table 7.3 shows profiles of Mom’s execution time for a
number of benchmarks. The total execution time is first broken down into three categories:
the time spent computing the responses of nodes, the time spent rescheduling devices, and
the time spent on miscellaneous functions unrelated to the first two categories (reading the
wirelist, parsing commands, etc.). Then each of the first two categories is broken down
further. Computing the response of nodes involves constructing the group, computing the
moments, and computing waveform approximations from those moments. Rescheduling
devices involves reducing the order of the boundary waveform, finding the root of the
reduced boundary waveform, polishing the root, and miscellaneous other functions.
Table 7.4 shows additional statistics gathered from the simulations. The first column
gives the average number of nodes in a group. The second column gives the average
number of poles in a waveform approximation. The third column gives the average number
of waveform segments making up a single logical transition of a node.
From these tables it can been seen that as more detailed transistor models are used, the
number of segments in a node transition increases. Whereas the Level-O CMOS rings have
1 segment per node transition, the Level-l versions of those rings have 5 and 7 segments.
While the Level-O ECL rings have 3 segments per node transition the Level-l and Level-2
versions have 18 and 8 segments, respectively. The increase in the number of segments
has a couple of causes. First, models with more regions of linearity generate more events
as they move between those regions. Second, floating capacitance and gain propagate the
effect of one device’s region change to other nodes. In the extreme (“BIT1 ECL Ring”)
floating capacitors couple all nodes together (there are a total of 18 nodes in that ring) such
CHAPTER 7. EVALUATION
123
68.5% nodeResponse
27.2% buildGroup
35.7% computeMoments
5.6% waveformApprox
17.8% reschedule
0.0% orderReduction
8.6% findRoot
0.0% rootPolish
9.2% other
13.7% other
25 -3% nodeResponse
8.5% buildGroup
12.9% moments
3.9% waveformApprox
69 .O% reschedule
11.6% OrderReduction
30.4% findRoot
10.2% rootPolish
16.8% other
5.7% other
(a) CMOS0 Inverter Ring
(b) CMOS 1 Inverter Ring
68.3% nodeResponse
25.6% buildGroup
33.2% computeMoments
9.5% waveformApprox
24.9% reschedule
0.7% orderReduction
15.1% findRoot
0.0% rootPolish
9.1% other
6.8% other
10.0% nodeResponse
2.9% buildGroup
3.4% moments
3.7% waveformApprox
89.4% reschedule
1.2% orderReduction
86.2% findRoot
0.8% rootPolish
1.2% other
0.6% other
.
(c) BJ’ID ECL Ring 20%
(d) BIT0 ECL Ring 10%
68.7% nodeResponse
15.7% buildGroup
18 .O% moments
35 .O% waveformApprox
29.7% reschedule
10.7% orderReduction
13.3% findRoot
1.7% rootPolish
4.0% other
1.6% other
47.7% nodeResponse
15.6% buildGroup
23.7% moments
8.4% waveformApprox
47.6% reschedule
1.1% orderReduction
37.7% findRoot
0.9% rootPolish
7.9% other
4.7% other
(e) BJT2 ECL Ring
(f) BiCMOS Ring
Table 7.3: Mom Execution Profiles for Various Benchmark Circuits.
CHARTER 7. EVALUATION
124
CMOS0 NAND Bin
CMOS 1 Inverter Ring
CMOS1 NAND Ring
BJTO ECL Ring 10%
BJTO ECL Ring 20%
BIT1 ECL Ring
BJT2 ECL Ring CapLev=l
BiCMOS Buffer Ring
Table 7.4: Mom’s Simulation Statistics for Ring Oscillators.
that every switching event of every transistor generates a new segment in the response of
every node.
Apparently another affect of coupling is that waveforms become more complex. Note
that while the “CMOS0 Inverter Ring” has only 1 pole in a waveform approximation the
“BJT2 ECL Ring CapLev=l”has, on average, 2.7 poles in an approximation. Unfortunately
an increase in the number of poles can have a serious impact upon efficiency. Tables 4.1
and 6.4 show that the costs of generating waveform approximations and finding the roots
of boundary waveforms increase by factors of 20 and 40, respectively, as the number of
poles increases from 1 to 3.
Thus the 20 x degradation caused by the introduction of Level-l models into the
“CMOS0 Inverter Ring” can be explained. The table reveals that the Level-l ring has
5 segments per node transition and that each segment consists of, on average, 1.3 poles. In
contrast the Level-O ring has only one segment per transition which consists of a single pole.
However, the table of execution profiles indicates that something else is going on. For the
Level-l ring Mom spends almost 70% of its time rescheduling devices as compared with
18% for the Level-O ring. Note that for the MOS Level-O model each region of linearity
is bounded by a single hypetplane. Thus when a Level-O MOS is rescheduled it is only
necessary to find the root of a single boundary waveform. However each region of the MOS
Level- 1 model is bounded by two hyperplanes. Thus it is necessary to find the roots of two
boundary waveforms when rescheduling a Level-l MOS. Also, the region of linearity of a
CHAPTER 7. EVALUATION
125
Level-O MOS is determined by the gate voltage alone. However, for the Level-l MOS the
region of linearity depends upon all three terminal voltages. Thus after a new waveform
segment is computed for a node it is necessary to reschedule only those Level-O MOS’s
with a gate attached to the node, but all Level- 1 MOS *s with either a gate, source, or drain
attached to the node. For the inverter ring this means that twice as many Level-l devices
must be rescheduled as Level-O &vices for each waveform segment. In all, the “CMOS1
Inverter Ring” requires Mom to perform 5 x 2 x 2 = 20 times as many boundary waveform
root computations per logic gate transition as the “CMOS0 Inverter Ring”.
An additional problem with device rescheduling is apparent from the execution time of
“BJXI ECL Ring 10%“. The tenfold increase in execution time resulting from decreasing
the error threshold of “BJTO ECL Ring 20%” seems somewhat surprising. More startling,
however, is the fact that its execution time exceeds that of “BIT1 ECL Ring” which
combines the entire ring into one group! The program profile reveals that the problem is
once again with device rescheduling: “BIT0 ECL Ring 10%” spends almost 90% of its
time rescheduling devices. A more detailed breakdown of the execution profile reveals
that almost all this time is spent finding the roots of boundary waveforms consisting of 1
simple plus 2 complex conjugate poles. On average 195 piecewise quadratic segments are
examined before a root is found! Clearly the simple, general algorithm used for this case
needs to be replaced by an algorithm employing better root bracketing. However, “BJTO
ECL Ring 10%” is an anomaly. No other circuit demonstrated such poor performance.
For example, both “BIT1 ECL Ring” and “BIT2 ECL Ring CapLev=l”have more than
twice the number of simple plus complex-conjugate pole boundary waveforms (46.4% and
57.8% vs 20.2% according to Table 6.5) and yet the average number of segments examined
is only 2.6 and 2.9, respectively.
The most striking conclusion to be drawn from the execution profiles is the relatively
high cost of rescheduling devices. Although most of the time in Irsim is spent computing
delay approximations, for Mom the cost of rescheduling devices often dominates. Even
in the best case device rescheduling never takes less than 18% of the total execution
time. What’s worse, only fairly simple models are considered here. If transistor models as
accurate and flexible as SPICE’s were implemented we would expect that Mom could easily
be slower than SPICE! Thus it appears that device rescheduling is fundamental problem
CHAPTER 7. EVALUATION
126
with Mom’s approach to simulation which will limit its utility in those cases where the full
accuracy and generality of circuit simulation are desired.
7.3 Summary
The objective behind producing Mom was to create a simulator that extended the accuracy
and flexibility of existing MOS and bipolar switch level simulators while simultaneously
preserving much of their efficiency for the simplest cases. This chapter evaluates the
extent to which we have achieved those goals. First, the additional flexibility obtained by
incorporating piecewise linear models into the switch-level framework was demonstrated.
Mom was applied to a number of circuits that appear to be just beyond the capabilities of
existing switch-level simulators. Then Mom’s efficiency was compared to that of existing
switch-level simulators. It appears that Mom is just a factor of 1.5 to 3.3 times slower than
switch-level simulation for comparable accuracy. However, Mom’s efficiency decreases
rapidly as more sophisticated models are used to obtain greater accuracy. More elaborate
models increase the size of groups, increase the number of segments per logic transition,
and increase the complexity of waveforms. In addition, the use of more elaborate models
can cause device rescheduling to dominate the cost of simulation. More complex models
must be rescheduled more frequently, and require more work to reschedule.
Chapter 8
Conclusion
By constructing a prototype simulator, Mom, we have shown that it is possible to modify
Rsim’s switch-level simulation framework to allow more general piecewise linear transistor
models in place of the switched resistor model. In addition we have shown that many of
Rsim’s restrictions can be removed: Mom allows floating capacitors, non-tree circuit
topologies, and feedback. Although these enhancements require extensive changes, they
don’t seriously impair the simulator’s efficiency for the simplest cases. That is when the
simplest switch-level models are used Mom achieves speeds and accuracies comparable to
those of dedicated switch-level simulators. In addition the ability to handle more general
piecewise linear models gives the simulator a great deal more flexibility. Mom can simulate
circuits that can’t be simulated by Rsim or Bisim and yet with substantial speedups over
SPICE.
This approach looks particularly promising for simulating circuits that are just beyond
the capabilities of switch-level simulation. Frequently most of a circuit can be simulated
using switch-level models and only small portions require more accurate models. Because
Mom has been structured such that the additional generality is paid for only where it is used
it can simulate those circuits with only a minor degradation of efficiency.
However our experiments uncovered some limitations to the approach. Apparently
the overhead of rescheduling devices will prevent Mom from replacing circuit simulators.
Benchmarks show that Mom’s speed falls precipitously as the complexity of transistor
models is increased. Unfortunately increased model complexity fundamentally requires
127
CHAPTER 8. CONCLUSlON
128
much more work to be done in order to determine when devices change regions of linearity.
We expect that the approach will lose its speed advantage when models having the full
accuracy and generality of SPICE’s nonlinear models are used.
Perhaps the most important contribution of this thesis is that it addresses the question:
is moments matching a practical alternative to numerical integration for computing the
transient response of nonlinear electrical networks? The results from this thesis indicate
that the answer is “it depends”. Consider the space bounded by circuit simulators at one
end and switch-level simulators at the other (Figure ,8.1).
Remember that the timing
increasing sneed
t
piecewise linear
moment-based
simulation
(Mom)
timing
circuit
simulation
(SPICE)
simulation
(MOTIS)
switch-level
simulation
(Rsim)
Figure 8.1: Simulation Space
simulators attempted to adapt the basic techniques used by circuit simulators (nonlinear
device models and numerical integration) in order to extend the capabilities of circuit
simulation in the direction of switch-level simulation. Despite some very impressive
speedups, a gap remained; timing simulators never became fast enough to replace switchlevel simulators. In an analogous fashion we have tried to adapt some basic techniques used
by switch-level simulators (piecewise linear device models and moment analysis) in order to
extend the capabilities of switch-level simulation in the direction of circuit simulation. The
result is a simulator that fills the gap between timing simulation and switch-level simulation,
although the initial indications are that this approach will not yield a replacement for circuit
simulation.
CHAPTER 8. CONCLUSION
129
An interesting perspective on the tradeoffs faced by the two approaches can be gained
by considering both from the standpoint of waveform approximation. The numerical
integration techniques used by circuit simulators approximate the time domain response
using a polynomial in t, where the polynomial is chosen to match the low order terms of
the Taylor series expansion of the actual response:
h(t) = h() + h,t + h*t2 + . . .
(8.1)
This produces an approximation (Figure 8.2) that converges to the actual response as t -+ 0.
Figure 8.2: Numerical Integration Approximation
The consequence of this approach is that the size of the time step is limited by the need
to maintain good convergence between the approximate and actual response. If the device
models are strongly nonlinear then past samples of the response are probably not going to
be good predictors of the future behavior. In that case it will probably be necessary to take
small time steps anyway. However if the models are linear or nearly linear then moments
matching offers a better alternative.
The moments matching techniques used by switch-level simulators approximate the
Laplace transform of the response using a ratio of polynomials in s, where the polynomials
are chosen in order to match the low or&r terms of the Taylor series expansion of the
Laplace transform of the actual response:
W) = nto + m1s + m*s* + . . .
(8.2)
The result is a frequency domain approximation that converges to the actual Laplace
transform as s + 0 or equivalently a time domain approximation (Figure 8.3) that converges
as t --+ 00. Thus the region of convergence begins at t = 00 for low order approximations
CHAl’lER 8. CONCLUSION
130
Figure 8.3: Moments Matching Approximation
and moves in towards the origin only as the order of the approximation is increased. For the
case of Rsim the models are linear and we are only interested in the response at its 50% point.
Apparently this point is far enough away from the origin that low order approximations
are usually sufficient. However, as more complex piecewise linear models are used to
approximate more strongly nonlinear behavior it becomes more likely that some device
will switch in the vicinity of t z 0. When this happens the work that went into getting
a good match for large t is wasted and the simulator retains what is essentially the worst
part of the approximation. It appears that moments matching is a poor choice when the
piecewise linear models are very detailed and have many regions of linearity.
From this perspective some related approaches appear to be worth investigating. Since
numerical integration appears to be advantageous when devices are strongly nonlinear, and
moments matching when devices are strongly linear, a hybrid approach could dynamically
choose between the two approaches based upon the anticipated step size. This would
avoid the work of generating a waveform approximation accurate at t = 00 in those
cases when only a small portion around t = 0 will be used. Simultaneously the accuracy
problems (described in Chapter 4) associated with generating extremely large waveform
approximations could be avoided. If a waveform approximation has an amplitude of several
thousand volts, probably only a small portion around t = 0 will be used. In those cases
numerical integration (or some other waveform approximation technique accurate near the
origin) should be selected.
Other waveform approximation techniques are possible[McC89, Cha9l]in addition to
CHAPTER 8. CONCLUSION
131
those described above. Our experience with Mom indicates that the principle problem that
must be addressed when trying to improve the efficiency of our simulator is the overhead of
rescheduling devices. To a large extent this overhead is due to the difficulty of finding the
roots of weighted sums of exponentials. Therefore, it would be worthwhile to investigate
other waveform approximation techniques with the goal of finding one which produces
approximate waveforms whose roots can be more readily computed. Such a technique
could easily yield a faster approach to simulation.
Appendix A
MOS Level-l Threshold
One of the peculiar aspects of the MOS Level-l model is that the gate-source threshold
depends upon the drain voltage. This was done to preserve the continuity of current just
as the transistor turns on. In the saturation region, the current through the MOS Level-l
“d
i= ~(Vvss-~ 1
“9-
b&-(r/,-9”I/ds) > 0
Ym
glvds -gm(Vgs - 6) > 0
J
go
+
(A.1)
(A.21
“s
Figure A. 1: Piecewise Linear MOS Model: Saturated Region
model (repeated above for convenience) is given by:
Id = h&s - vt) + g&&
(A.3)
Unfortunately, when Vgs = V, the model yields nonzero current due to the output conductance:
Id = go&
(A.41
If the model turns on precisely at V,, = V,, the resulting discontinuity in cutrent can result
in a non-physical staircase response for some source follower circuits (Figure A.2).
132
APPENDIX A. MOS LEVEL- 1 THRESHOLD
133
V
T
“9
+
vs
+
“9
vs
0’
.
T:
Figure A-2: Staircase Response from Source Follower
However, when the hyperplane bounding the saturation region is changed from
to
v,, = vt
(A.3
90
v,, = v, - -vds
Ym
(A-6)
this undesirable non-physical discontinuity is eliminated. That Id = 0 at the new threshold
can be verified by substituting Equation (A.6) into Equation (A.34
Appendix B
MOS Level-l Polytopes
In order to verify the hyperplane equations it is useful to plot the polytopes of the various
regions of operation. Since the model has no internal connection to ground, we can
arbitrarily set the source to ground and plot the polytopes in the two dimensions vd and V,
(Figure B.l). Note that the regions in this figure are slightly more general than those of
Reverse
SahJfalion
saturation
Figure B. 1: Polytopes of MOS Level- 1 model
the Level- 1 model because they include the reverse (I& < 0) as well as the forward modes
of operation.
To plot the boundary between the forward saturation and off regions, set V, = 0 in
Equation (3.7) and solve for V’
vg = -Eb + v,
134
(B.1)
APPENDIX B. MOS LEVEL- 1 POLYTOPES
135
We see that this boundary is a negatively sloped line through the point (0, &). Similarly
the equations for the linear - forward saturation, reverse saturation - linear, and off reverse saturation boundaries are
v,= Ev,+v,
U3.2)
v, = (1 -$)v,+K
(B.3)
v, = (l+E)l/d+vL,
(B.4)
respectively.
Note that all four hyperplanes pass through the point (0, K). Therefore they partition
space into four disjoint regions, and each region is bounded by exactly two hyperplanes.
However, rather than model all four regions, our implementation restricts the model to the
three forward regions by installing a hyperplane at vd = 0 and interchanging the source
and drain terminals anytime it is crossed. This simplifies the implementation because fewer
regions need to be modeled, although it is probably less efficient because extra model state
changes may be required.
Appendix C
Linearization of Bipolar Transistor
Capacitances
The SPICE model for the bipolar transistor incorporates four nonlinear parasitic capacitances (Figure C. 1). The capacitors: Cje, Cjc, and Cjs are the capacitances associated with
Rc
cjc
cjs
Rb
(1
Cje
Cqb
1
1
Re
Figure Cl : Nonlinear Capacitances in SPICE bjt model
the base-emitter, base-collector, and collector-substrate junctions. The fourth capacitor:
Cqb represents the storage of active charge in the base.
It is well known that semiconductor junction capacitances are well approximated by an
“average” capacitance formed by dividing the change in depletion charge by the change in
136
APPENZXX C. LZNEAZ2ZzATIoN OF BIPOLAR 7XANSZSToRCAPACITANCES 137
voltage[HJ83, page 1371:
c,, = -
cjO+O [(ls~)lwmp
(Vz-&)(1-m)
(l-2)1sm]
K.1)
Here, Cja is the zero bias capacitance of the junction, VI and V2 are the limits of the voltage
swing, & is the built in junction potential, and m is the junction grading coefficient (for
abrupt junctions n = l/2, for linearly graded junctions m = l/3).
This result is accurate if the junction is reverse biased or forward biased slightly (Vj c
&/2). For larger forward voltages it can be modified to reflect SPICE’s model[AMSS,
page601 of the capacitance of junctions under heavy forward bias:
- &Vi)
Gp = QWb-v,
(1 - g-m - 11
(C.2)
V$
Q(V) =
(C.3)
“
’
It less well known that the small signal model for stored base charge provides a good
model for the base charge of an ECL inverter. Our piecewise linear model is roughly the
small signal model of a bipolar transistor linearized about the switching point of the ECL
gate. For our model:
Ic =
qb
=
I6
=
YmWe - Km)
7~
1,
d%
-&-
dIc
= 7’ygd&e
= rfYmdt
(C.4)
C.5)
(-9
(C.7)
(C.8)
Thus the base charge is modeled by the capacitance: gmrf [MK77, page 276). Figure C.2
compares the response of ECL inverters using SPICE and piecewise linear models. (The
output of each inverter is loaded by another inverter.) To highlight the effect of the base
APPENDIX C. LZNEAZUZATZON OF BIPOLAR TRANSISTOR CAPACITANCES 138
0.c)F
V
-0.1
\
7
\
OU
-
-
\
-
-
1
\\
_I_
1
-
-
_
-
_
_
_
-io
\
-0.2
-0.3
-
I
- SPICE
---Mm
’
- - - - I
-0.2
-0.3
-0.4
-0.q
I
,J _
-------L
-o.2‘
nS time
Figure C.2: Linearized Base Charge Model
charge model, all other capacitances have been set to zero. The plots reveal that the model
is quite good. (The sawtooth response predicted by SPICE is an artifact of SPICE’s solution
method and is not indicative of the actual behavior of the model.) Although there are minor
deviations between the two responses these will be averaged out once other capacitors are
added to the transistor model.
o:4
Appendix D
Optimal Frequency Scaling
D.1 Algorithm
It is interesting to note that it is possible to perform “optimal” frequency scaling of moments.
That is, if the moments are scaled by l/s units of time:
liii =
TTl*S’
CD.11
then an s can be found that minimizes ratio of the largest moment’s magnitude to the
smallest moment’s magnitude:
mF Ililil
“:” lIii,l
(D.2)
The solution is obtained by considering the logarithm of the magnitudes of the scaled
moments:
1; = log( Ifi; I>
=
~of$(~mi~'~)
P-3)
W.4)
= lOg( Imil) + i log(S)
CD.51
=
VW
q;+ia
where we define 9; = log( (mil), Q = log(s). Then the objective is met when that value of
CT is found that minimizes the maximum of all pairwise differences: Sij = 1; - Ij.
139
APPENDIX D. OPTIMAL FREQUENCY SCALZNG
140
However, each difference is now a linear function of a:
hi,j(S)
= Qi - Qj + (i -j)a
(D.7)
Figure D.l shows a plot of pairwise differences. At each u the shaded region is bounded
Log Magnitude Variation
Figure D. 1: Difference Functions
by the line representing the largest difference. This problem can be solved using techniques
similar in spirit to those employed by the Simplex Method from linear programming.
Because the shaded region is convex, the minimum must lie on a comer point. Therefore
the search can start at a point on the boundary and jump from comer point to comer point
until the minimum is found.
D.l.l
Implementation
However a considerable number of parallel lines can be pruned before the search is begun.
The W of differences: { 6j+lj } are parallel because they all have the term a; the differences:
{bj,l+t } are parallel because they all have the term --a; the differences: 6j+2,j, are parallel
because they include the term 20, etc. For each set of parallel lines only the highest line
need be retained:
Aj cm? (6i+j - 6;)
(D.8)
APPElVDIX D. OPTIMAL FREQUENCY SCALIlVG
141
If there are k moments then after all redundant parallel lines are eliminated only 2( k - 1)
will remain.
The general form of the difference functions will be:
1
2
3
..
.
Al (4
w7>
A364
..
.
b-1 (0)
A-1 (6)
A-2(4
A-,(4
...
A--k+1
=
k - l
-1
-2
-3
..
bl
k?
44
.
a+
-k+l
h-1
b-1
b-2
b-3
.
.
b-k+1
(D.9)
where bi is the y-axis intercept of A;(a). The objective is to find the u that satisfies:
m61n {m,f’?( (A,)}
(D. 10)
A reasonable place to begin the search is at the highest point of intersection of the y
axis with the difference functions (point 1 in the Figure). That point becomes the current
point, and difference function passing through it becomes the current line.
Subsequent points are selected to reduce the maximum difference. Candidates for the
next point are found by computing the points of intersection of the current line with all
other lines. In general, the abscissa of the point of intersection between Ap and A, is:
(D.11)
If the current line has a positive slope, the closest point to the left of the current point is
selected Otherwise the closest point to the right of the current point is selected. For our
example the current line is negatively sloped so the search proceeds to point 2 (point 4 is
closer but increases the maximum difference, point 5 reduces the difference but is further
from point 1 than point 2). Note that it is important to stop at the closest intersection point
to avoid leaving the shaded (in linear programming termsfeasible) region.
APPENDlX D. OPTIMAL FREQUENCY SCALING
142
Finally, the slope of the new line at the newly selected point is examined. If it has
the same sign as the slope of the current line then it becomes the new current line, the
intersection point becomes the new current point, and the procedure iterates. Otherwise the
global minimum has been found and the optimal scaling factor is given by eucro*‘. For our
example the search continues through point 2 but terminates at point 3.
D.2 Efficiency
The biggest drawback of optimal frequency scaling is its inefficiency. If the scaling factor
is computed from the ratio of two moments then frequency scaling is O(n) in the number
of moments. In contrast the complexity of optimal frequency scaling is O(n’). The
complexity of the pruning algorithm is O( n2) in the number of moments because the there
are n(n - 1)/2 ways of pairing n moments. The complexity of the search algorithm is
also O(n2) because at each search step 2n - 3 intersection points must be considered and
as many as 272 - 2 comer points may have to be traversed (although the experience with
linear programming indicates this unlikely). Thus when the procedure was implemented,
it was found to be too expensive to justify the small improvement for the large majority of
waveforms.
However, although they haven’t been tried, there are a few possibilities for tuning the
routine. A very expensive part of the optimal computation is taking the logarithm of each of
the moments. It may be possible to efficiently approximate the logarithm by extracting the
exponent bits from the machine’s floating point representation. Also note that the divisions
in Equation D.11 can be turned into multiplies if the constants: l/2,1 /3,1/4,1/5, . . .1/2n
are precomputed and stored. Finally, the efficiency of the algorithm can be improved if a
less demanding criterion for optimal@ is used. If instead of minimizing the ratio of the
largest moment to the smallest moment, we simply minimize the largest ratio of affjaceti
moments, then the complexity of the algorithm reduces to O( 72).
Appendix E
Unstable Waveforms
Chapter 6 ignored the problem of finding the roots of waveforms containing unstable poles.
In fact, unstable responses present little additional difficulty because an unstable waveform
can be easily mapped into a stable waveform with the same roots. To illustrate, suppose we
have the unstable waveform:
b(t) = b + kledlf + k2euzt + . , . -I- kpedPt
(E.1)
where 01 = CY~ + j/?, is assumed to be the frequency with the most positive real part. If we
factor out the exponential e*lt we get a new function w(t):
b(t) =
=
ealt be-Qlt + k,@lt + k2e(u2-CLl)t + . . . kpeh-a’“)
(
e@ltW( t)
(E.2)
(E-3)
Again note that w(t) has the same roots as b(t) because ealt # 0. Furthermore, the
assumption that ~1 has the most positive real part implies that all the frequencies of w(t)
have real parts that are less than or equal to zero. Thus w(t) is a stable waveform with roots
identical to b(t).
143
Bibliography
[AHU85]
Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. Data Structures and
Afgorirhms. Addison-Wesley Publishing Company, Reading, Massachusetts,
1985.
[AM881
Paolo Antognetti and Giuseppe Massobrio. Semiwnducror Device Modeling
with SPICE. McGraw-Hill Book Company, New York, 1988.
[A&78]
Kendall E. Atkinson. An Introduction to Numerical Analysis. John Wiley &
Sons, New York, 1978.
[BL72]
M. J. Bosley and F. l? Lees. A survey of simple transfer-function derivations
from high-order state variable models. Automatica, 8:765-775, 1972.
[Bra801
Franklin H. Branin, Jr. The analysis and design of power distribution nets
on LSI chips. In IEEE International Conference on Circuits and Computers,
pages 785-790,198O.
my801
Randal E. Bryant. An algorithm for MOS logic simulation. Lambda, the
Magazine of VLSI Design, 1(3)&i-53,1980.
[BS65]
Amar G. Bose and Kenneth N. Stevens. Introductory Network Theory. Harper
& Row, New York, 1965.
[CGK75]
B. R. Chawla, H. K. Gummel, and P. Kozak. MOTIS-an MOS timing simulator. IEEE Transactions on Circuits and Systems, 22901~909, December
1975.
144
BIBLIOGRAPHY
145
[Cha88]
Pak K. Chan. Signal delay in RC networks with floating capacitors. In
IEEE International Symposium on Circuits and Systems, page get the page
numbers, June 1988.
[Cha91]
Pak K. Chan. Comments on “asymptotic waveform evaluation for timing
analysis”. IEEE Transactions on Computer-AidedDesign, lO(8): 1078-1079,
August 1991.
[Chu88]
Chomg-Yeong Chu. Improved Models for Switch-Level Simulation. PhD
thesis, Stanford University, November 1988.
[CK89]
Pak K. Chan and Kevin Karplus. Computing signal delay in general RC
networks by tree/link partitioning. In 26th ACMIIEEE Design Automation
Conference, pages 485-490, June 1989.
[CL751
Leon 0. Chua and Pen-Min Lin. Computer Aided Analysis of Electronic
Circuits: Algorithms and Computational Techniques. Prentice-Hall, Inc.,
Englewood Cliffs, New Jersey, 1975.
[CS84]
Chin Fu Chen and Prasad Subramanium.
The second generation MOTIS tim.
ing simulator - an efficient and accurate approach for general MOS circuits.
In IEEE International Symposium on Circuits and Systems, pages 538-542,
1984.
[dG84]
Aart J. de Geus. SPECS: Simulation program for electronic circuits and
systems. In IEEE International Symposium on Circuits and Systems, pages
534-537,1984.
[DMHH87]
Giovanni De Micheli, Hsueh Y. Hsieh, and Ibrahim Hajj. Decomposition
techniques for large scale circuit analysis and simulation. In A. E. Ruehli,
editor, Circuit Analysis, Simulation andDesign, 2, chapter 7. Elsevier Science
Publishers B. V. (North-Holland), 1987.
[DMNSV83] G. De Micheli, A. R. Newton, and A. Sangiovanni-Vincentelli. Symmetric
displacement algorithms for the timing analysis of MOS VLSI circuits. IEEE
Transactions on Circuits and Systems, pages 167-197, July 1983.
BZBLZOGRAPHY
146
[DvGdGSS] P M. Dewilde, A. J. van Genderen, and A. C. de Graaf. Switch level timing
simulation. In International Conference on Computer-Aided Design, pages
182-l 84,1985.
[Elm481
W. Elmore. The transient response of damped linear networks with particular
regard to wideband amplifiers. Journul of Appfied P hysics, 19:55-63, January
1948.
[GD85]
Lance A. Glasser and Dan W. Dobberpuhl. The Design and Analysis of VLSI
Circuits. Addison-Wesley Publishing Company, Reading, Massachusetts,
1985.
[GM881
Edwin W. Greeneich and Kevin L. McLaughlin. Analysis and characterization of bicmos for high-speed digital logic. IEEE Journal of Solid State
Circuits, 23(2):558-565, April 1988.
WW
Nanda Gopal and Lawrence T. Pillage. Constrained approximation of dominant time-constant(s) in RC-circuit delay models. Technical Report UTCERC-TR-LTP91-01, The University of Texas at Austin, January 1991.
[HJ83]
David A. Hodges and Horace G. Jackson. Analysis and Design of Digital
Integrated Circuits. McGraw-Hill Book Company, New York, 1983.
[Hor83]
Mark Alan Horowitz. Timing Models for MOS Circuits. PhD thesis, Stanford
University, December 1983.
[Hou70]
A. S. Householder. The Numerical Treatment of a Single Nonlinear Equation.
McGraw-Hill Book Company, New York, 1970.
[HS87]
Ibrahim N. Hajj and Daniel Saab. Switch-level logic simulation of digital
bipolar circuits. IEEE Transactions on Computer-Aided Design, 6(2):25 l258, March 1987.
[HSV8 l]
Gary D. Hachtel and Albert0 L. Sangiovanni-Vincentelli. A survey of thirdgeneration simulation techniques. Proceedings of the IEEE, 69( 10): 12641280, October 1981.
BIBLIOGRAPHY
147
[Hua90]
Xiaoli Huang. fade Approximation of Linear(ized) Circuit Responses. PhD
thesis, Carnegie Mellon University, December 1990.
[Jou83]
N. P. Jouppi. TV: An nMOS timing analyzer. In 3rd Caltech Conference on
VLSI, pages 7 l-85, March 1983.
[KAHS88]
Russell Kao, Bob AIverson, Mark Horowitz, and Don Stark. Bisim: A
simulator for custom ECL circuits. In International Conference on ComputerAided Design, pages 6265, November 1988.
[KKSN84]
Young H. Kim, J. E. Kleckner, R. A. Saleh, and A. R. Newton. Electricallogic simulation. In International Conference on Computer-Aided Design,
pages 7-9, November 1984.
[Kro39]
G. Kron. Tensor Analysis of Networks. Wiley, 1939.
[KSM+78]
Kuniyasu Kawarada, Masao Suzuki, Hisakazu Mukai, Kazuhiro Toyoda,
and Yoshisuke Kondo. A fast 7.5ns access lk-bit RAM for cache-memory
systems. IEEE Journal of Solid State Circuits, SC-13(5):656663, October
1978.
[Kun86]
Kenneth S. Kundert. Sparse matrix techniques. In A. E. Ruehli, editor, Circuit
Analysis, Simulation and Design, chapter 6. Elsevier Science Publishers B.
V. (North-Holland), 1986.
[LM84]
Tzu-mu Lin and Carver A. Mead. Signal delay in general RC networks. IEEE
Transactions on Computer-Aided Design, 3(4):331-349, October 1984.
[LSV82]
E. Lelarasmee and A. Sangiovanni-Vincentelli. Relax: A new circuit simulator for large scale MOS integrated circuits. In I9th ACM/IEEE Design
Automation Conference, pages 682-690, June 1982.
[McC89]
Steven Paul McCormick. Modeling and Simulation of VLSI Interconnections
with Moments. PhD thesis, Massachusetts Institute of Technology, June 1989.
BZBLZOGBAPHY
148
[MK77]
Richard S. Muller and Theodore I. Kamins. Device Electronics for Integrated
Circuits. John Wiley & Sons, Inc, New York, 1977.
[MMS+80]
Osamu Minato, Toshiaki Masuhara, Toshio Sax&i, Hideaki Nakamura,
Yoshio Sakai, Tokumasa Yasui, and Kiyofumi Uchibori. 2k x 8 bit Hi-CMOS
static RAM’s. IEEE Journal of Solid State Circuits, sc-15(4):656-660, August 1980.
Wag751
L. Nagel. SPICE2: A computer program to simulate semiconductor circuits.
Technical Report ERL-520, University of California, Berkeley, May 1975.
[New791
Richard Newton. Techniques for the simulation of large-scale integrated circuits. IEEE Transactions on CircuitsandSysterns, 26(9):741-749, September
1979.
FW61
H. M. Paynter. On an analogy between stochastic processes and monotone
dynamic systems. In G. Mgller, editor, Regefungstechnik. Moderne Theorien
und ihre Verwendbarkeit, pages 243-250. R. Oldenbourg, 1956.
F’DW
Lawrence T. Pillage and Santanu Dutta. A path tracing algorithm for asymptotic waveform evaluation of lumped RLC circuit delay models. In ACM
Workshop on Timing Issues in the Specification and Synthesis of Digital
Systems, August 1990.
[Pi1891
Lawrence T. Pillage. Asymptotic Waveform Evaluation for Timing Analysis.
PhD thesis, Carnegie Mellon University, April 1989.
[PR8 l]
Paul Penfield, Jr. and Jorge Rubinstein. Signal delay in RC tree networks. In
18th ACMIIEEE Design Automation Conference, pages 6 13-6 17, June 198 1.
PR!W
Lawrence T. Pillage and Ronald A. Rohrer. Asymptotic waveform evaluation for timing analysis. IEEE Transactions on Computer-Aided Design,
9(4):352-366, April 1990.
BIBLIOGRAPHY
149
[Put821
R. Putatunda. Auto-Delay: A program for automatic calculation of delays in
LSI/VLSI chips. In 20th ACMIIEEE Design Automation Conference, pages
616-621, June 1982.
[RGP91]
Curtis L. Ratzlaff, Nan& Gopal, and Lawrence T. pillage. RICE: Rapid
interconnect circuit evaluator. In 28th ACM/IEEE Design Automation Conference, pages 555-560, June 1991.
[Roh88]
R. A. Rohrer. Circuit partitioning simplified. IEEE Transactions on Circuits
and Systems, 35( 1):2-5, January 1988.
[RSVH79]
N. B. Guy Rabbat, Albert0 L. Sangiovanni-Vincentelli, and Hsueh Y. Hsieh.
A multilevel newton algorithm with macromodeling and latency for the analysis of large-scale nonlinear circuits in the time domain. IEEE Transactions
on Circuits and Systems, 26(9):73>741, September 1979.
[RT85a]
Arvind Raghunathan and Clark D. Thompson. Signal delay in rc trees with
charge sharing or leakage. In 19th Asilomar Conference on Circuits, Systems,
ad Computers, pages 557-56 1, November 1985.
[RT85b]
Vasant B. Rao and Timothy N. Trick. Switch-level timing simulation of MOS
VLSI circuits. In IEEE International Symposium on Circuits and Systems,
pages 229-232,1985.
[RV87]
Ronald A. Rohrer and Chandramouli Visweswariah. SPECS2: An integrated
circuit timing simulator. In International Conference on Computer-Aided
Design, pages 94-97, November 1987.
[RVB88]
Genhong Ruan, Jiri Vlach, and James A. Barby. Current-limited switch-level
timing simulator for MOS logic networks. IEEE Transactions on ComputerAided Design, 7(6):659-667, June 1988.
[Sal83]
R. Saleh. Iterated timing analysis and SPLICEl. Master’s thesis, University
of California, Berkeley, 1983.
BZBLZOGUPHY
150
[Sch85]
Thomas J. Schaefer. A transistor-level logic-with-timing simulator for MOS
circuits. In 22ndACMIIEEE Design Automation Conference, pages 762-765,
1985.
[SH89]
Arturo Salz and Mark Horowitz. Irsim: An incremental MOS switch-level
simulator. In 26th ACM/IEEE Design Automation Conference, pages 173178, June 1989.
[SHW
Don Stark and Mark Horowitz. Techniques for calculation currents and
voltages in VLSI power networks. IEEE Transactions on Computer-Aided
Design, 9(2): 126-l 32, 1990.
[SNGR91]
Alexander D. Stein, Tuyen V. Nguyen, Binay J. George, and Ronald A.
Rohrer. ADAPTS: A digital transient simulation strategy for integrated
circuits. In 28th ACM/IEEE Design Automation Conference, pages 2631,
June 1991.
[SSD72]
Karl U. Stein, Aame Sihling, and Elko Doering. Storage array and
sense/refresh circuit for single transistor memory cells. IEEE Journal of
Solid State Circuits, SC-7(5):336-340, October 1972.
[SVCc77]
Albert0 Sangiovanni-Vincentelli, Li-Kuan Chen, and Leon 0. Chua. A
new tearing approach - node-tearing nodal analysis. In IEEE International
Symposium on Circuits and Systems, pages 143-147,1977.
[SYHSS]
D. G. Saab, A. T. Yang, and I. N. Hajj. Delay modeling and timing of bipolar
digital circuits. In 25th ACMIIEEE Design Automation Conference, pages
288-293, June 1988.
W871
Chuanjin Shi and Kaihe Zhang. A robust approach for timing verification. In
International Conference on Computer-Aided Design, pages 5659, November 1987.
[Ter83]
C. J. Terman. Simulation Tools for Digital LSI Design. PhD thesis, Mas-
sachusetts Institute of Technology, September 1983.
BZBLZOGZUZ’HY
[vB87]
w=w
[wJM+73]
151
W. M. G. van Bokhoven. Piecewise linear analysis and simulation. In
A. E. Ruehli, editor, Circuit Analysis, Simulation and Design, 2, chapter 10.
Elsevier Science Publishers B. V. (North-Holland), 1987.
ChandramouIi Visweswariah, Peter Feldmann, and Ronald A. Rohrer. Incorporation of inductors in piecewise approximate circuit simulation. In International Conference on Computer-Aided Design, pages 162-165, November
1990.
W. T. Weeks, A. J. Jiminez, G. W. Mahoney, D. Mehta, H. Qassemzadeh,
and T. R Scott. Algorithms for ASTAP-a network analysis program. IEEE
Transactions on Circuit Theory, 20:628-634, November 1973.
[m831
Fred Walczyk and Jorge Rubinstein. A merged CMOS/bipolar VLSI process.
In International Electron Devices Meeting Technical Digest, pages 59-62,
December 1983.
[Wu76]
Felix F. Wu. Solution of large-scale networks by tearing. IEEE Trunsactions
December 1976.
on Circuits andsystems, 23(12):706-713,
+
WyW
J. L. Wyatt. Signal delay in RC mesh networks. IEEE Transactions on
Circuits and System, 32(5):507-5 10, May 1985.
[YHT80]
P. Yang, I. N. Hajj, and T. N. Trick. SLATE: A circuit simulation program
with latency exploitation and node tearing. In IEEE International Conference
on Circuits and Computers, pages 353-355, October 1980.
[~W
Tsen-Shau Yang, Mark A. Horowitz, and Bruce Wooley. A 4ns 4kx 1 -bit twoport bicmos mm. IEEE Journal of Solid State Circuits, 23(5):103&1040,
October 1988.
G-731
V. Zakian. Simplification of linear time-invariant systems by moment approximants. International Journal of Control, 18(3):455+0,1973.
Download