Gate-Level Current Modeling of Digital Integrated Circuits for

advertisement
Gate-Level Current Modeling of
Digital Integrated Circuits for
Conducted Chip Emission
Characterization
Modellierung der Stromaufnahme digitaler
integrierter Schaltungen auf Gatterebene zur
Charakterisierung der leitungsgebundenen
Chip-Emission
Der Technischen Fakultät der Universität
Erlangen-Nürnberg zur Erlangung des Grades
DOKTOR-INGENIEUR
vorgelegt von
Andreas GSTÖTTNER
Erlangen - 2010
Als Dissertation genehmigt von
der Technischen Fakultät der
Universität Erlangen-Nürnberg
Tag der Einreichung
Tag der Promotion
Dekan
Berichterstatter
: 01. Februar 2010
: 11. März 2010
: Prof. Dr.-Ing. Reinhard German
: Prof. Dr.techn Mario Huemer
Prof. Dr.-Ing. Klaus Helmreich
Acknowledgement
This work would not have been accomplished in that way without the help and
support of several people to whom I want to express my sincere gratitude. First of
all, I wish to thank Prof. Dr. Mario Huemer for being my research advisor and his
guidance and support.
Special thanks to Thomas Steinecke for consistently supporting me during all
stages of this work, his organizational efforts, and the precious ideas and inspiring
discussions. I am also very thankful to all colleagues at Infineon Technologies who
contributed to this work, particularly Jack Kruppa and Mehmet Goekcen. Their
full support and many interesting discussions turned out to be indispensable for
the success of this thesis.
I would like to thank Prof. Dr. Robert Weigel and all the colleagues at the Institute for Electronics Engineering for creating an extraordinary research environment
and working atmosphere. Very special thanks to Florian Frank, my colleague at
the MISEA project, not only for giving me insight to the swabian culture and many
almost religious discussions about Linux. Furthermore, I would like to thank Ralf
Mosshammer, Thomas Ussmüller, Alexander Kölpin, Benjamin Waldmann, Wolfgang Tobginski and Adrian Voinea for their mental support and motivation during
the composition of the thesis.
I want to express cordial thanks to my family for their continuous support, particularly to my parents which have been my first and most important teachers.
Finally, I would like to thank Sandra for her patience and understanding.
Abstract
This thesis investigates novel methods to characterize the current profiles of complex
digital very large scale integrated circuits. The possibly huge number of simultaneously switching transistors of complex digital devices causes significant current
peaks, and therefore a considerable noise on the power supply lines. This may lead
to interferences with other components of the system, but may also cause electromagnetic compatibility issues. Measures to eliminate this noise and to stabilize the
supply voltage need therefore be implemented. This can be done at the printed
circuit board or in the chip package, but the probably most efficient and economic
approach is to place appropriately matched measures close to the respective noise
sources on the chip. Since on-chip measures need to be integrated into the circuit
design, simulation models are very helpful to identify the potential noise sources
in early phases of the development process. Early simulations also enable studies
to predict the effects of different design options. Such models typically consist of
passive components, representing the properties of the on-chip wiring, and of active noise sources which model the transient current consumption of the respective
components. In the focus of this thesis are high-level approaches to determine the
dynamic behavior of digital integrated circuit designs and to generate noise models
for early design studies.
The introduced methods are based on gate-level circuit descriptions which are
typically available after the circuit synthesis. This enables design analysis before
the actual layout of the cell interconnect wires and the on-chip power distribution
network are implemented. A library, providing parameters which describe the dynamic behavior of the particular cells in terms of the switching current waveform
characteristics and signal transition timing information, is therefore characterized.
For an efficient determination of the switching activities of the particular cells,
complex circuits are partitioned, and a combined approach of a pattern-based simulation and a random activity interpretation, is introduced. As gate-level netlists
do not provide any information concerning the on-chip wiring characteristics, approaches to approximate the parasitic effects of cell interconnect wires are discussed
as well.
Kurzfassung
Diese Arbeit behandelt neuartige Methoden zur Charakterisierung des zeitlichen
Verlaufs der Stromaufnahme von komplexen hoch-integrierten Schaltungen. Die oft
hohe Zahl der gleichzeitig schaltenden Transistoren in komplexen digitalen Komponenten kann beträchtliche Stromspitzen und somit Störungen auf den Versorgungsleitungen verursachen. Das kann zur Beeinträchtigung benachbarter Systemkomponenten führen, aber auch Probleme mit der Elektromagnetischen Verträglichkeit
hervorrufen. Maßnahmen zur Eliminierung dieser Störungen und zur Stabilisierung der Versorgungsspannung sind dazu erforderlich. Diese können auf der Leiterplatte oder im Gehäuse des Chips realisiert werden, der effizienteste und wirtschaftlichste Ansatz ist aber meist, geeignete Maßnahmen direkt in der Nähe der
jeweiligen Störquelle am Chip zu platzieren. Da Maßnahmen am Chip in die Schaltung mit integriert werden, sind Simulationsmodelle enorm hilfreich, um potentielle
Störquellen bereits in frühen Phasen des Entwicklungsprozesses zu identifizieren.
Frühe Simulationen ermöglichen darüber hinaus Studien zur Vorhersage der Auswirkungen verschiedener Designvarianten. Solche Modelle bestehen typischerweise
aus passiven Elementen, die die Eigenschaften der Verbindungsleitungen am Chip
repräsentieren, und aus aktiven Störquellen zur Modellierung des zeitlichen Verlaufs der Stromaufnahme der jeweiligen Komponenten. Schwerpunkt dieser Arbeit
sind neuartige Ansätze zur Bestimmung des dynamischen Verhaltens von digitalen
Schaltungen und die Entwicklung von Modellen der Störungen, die für frühzeitige
Designstudien herangezogen werden können.
Die vorgestellten Methoden basieren auf Beschreibungen der Schaltungen auf
Gatterebene, welche typischerweise bereits nach der Schaltungssynthese verfügbar
sind. Dies ermöglicht Designanalysen bevor das tatsächliche Layout der Verbindungsleitungen der Zellen sowie des Versorgungssystems am Chip implementiert
werden. Eine Bibliothek, die Parameter zur Beschreibung des dynamischen Verhaltens der diversen Zellen in Form von Charakteristiken der Schaltstromverläufe und
Informationen über das Timing der Signalübergänge zur Verfügung stellt, wird dazu
charakterisiert. Für eine effiziente Bestimmung der Schaltaktivitäten der jeweiligen
Zellen wird ein kombinierter Ansatz aus einer pattern-basierten Simulation und einer Interpretation von zufälligen Aktivitäten vorgestellt. Da Gatternetzlisten keine
Informationen über Verbindungsleitungen am Chip liefern, werden darüber hinaus
Methoden zur Approximation der parasitären Effekte von Verbindungsleitungen
behandelt.
Contents
1. Introduction
1.1. Motivation . . . . . . . . .
1.2. State of the Art . . . . . .
1.3. Goals of this Work . . . .
1.4. Organization of the Thesis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Digital Integrated Circuit Basics
2.1. MOS Transistor . . . . . . . . . . . . .
2.1.1. Structure and Operation . . . .
2.1.2. MOS Transistor Capacitances .
2.2. CMOS Devices . . . . . . . . . . . . .
2.2.1. Static Behavior . . . . . . . . .
2.2.2. Transient Characteristics . . . .
2.2.3. Power and Energy Consumption
2.3. Cell-Based Design Methodology . . . .
2.3.1. Standard Cells . . . . . . . . .
2.3.2. Macro Cells . . . . . . . . . . .
2.4. Deep Submicron Interconnects . . . . .
2.4.1. Interconnect Parameters . . . .
2.4.2. Wire Models . . . . . . . . . . .
2.5. Power Distribution Networks . . . . . .
2.5.1. Voltage Drops . . . . . . . . . .
2.5.2. Decoupling Capacitances . . . .
3. Synchronous Sequential Digital Systems
3.1. General Principles . . . . . . . . . . .
3.2. Digital System Clocking . . . . . . .
3.2.1. System Clock Generation . . .
3.2.2. Clock Distribution . . . . . .
3.2.3. Clock Gating . . . . . . . . .
3.3. Combinational Logic Cells . . . . . .
3.4. Clocked Storage Devices . . . . . . .
3.4.1. Latches . . . . . . . . . . . .
3.4.2. Flip-Flops . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
4
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
11
12
14
15
17
17
18
19
20
20
22
24
25
26
.
.
.
.
.
.
.
.
.
27
27
29
29
30
31
33
34
35
35
x
Contents
3.4.3. Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4. Standard Cell Characterization
4.1. Outline of the Methodology . . . . . .
4.2. Dynamic Cell Behavior Considerations
4.3. Equivalent Inverter . . . . . . . . . . .
4.4. Single Cell Simulations . . . . . . . . .
4.4.1. Simulation Environment . . . .
4.4.2. Parameter Variation . . . . . .
4.4.3. Simulation Stimuli . . . . . . .
4.5. Characteristic Parameter Extraction .
4.5.1. Timing Characteristics . . . . .
4.5.2. Current Profiles . . . . . . . . .
4.6. Macro Cell Characterization . . . . . .
36
.
.
.
.
.
.
.
.
.
.
.
41
41
42
45
48
49
50
51
53
53
54
56
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
59
60
61
62
64
64
65
68
70
71
74
76
77
81
6. Parasitic Effected Current Modeling
6.1. General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2. Cell Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3. Power Distribution Networks . . . . . . . . . . . . . . . . . . . . . .
87
87
90
93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5. Netlist Based Current Modeling
5.1. Current Profile Calculation . . . . . . . . . . . . . .
5.1.1. Cell Environment Identification . . . . . . .
5.1.2. Single Event Characteristics Determination .
5.1.3. Current Profile Composition . . . . . . . . .
5.2. Circuit Simulation Methods . . . . . . . . . . . . .
5.2.1. Pattern Based Simulation . . . . . . . . . .
5.2.2. Random Activity Interpretation . . . . . . .
5.3. Modeling of Complex Modules . . . . . . . . . . . .
5.3.1. Module Partitioning . . . . . . . . . . . . .
5.3.2. Clock Subsystem . . . . . . . . . . . . . . .
5.3.3. Clocked Storage Elements . . . . . . . . . .
5.3.4. Combinational Logic . . . . . . . . . . . . .
5.3.5. Profile Composition . . . . . . . . . . . . . .
5.3.6. Multiple Clock Domains . . . . . . . . . . .
7. Implementation and Verification
7.1. Library of Characterized Cells . . . . . .
7.1.1. Static Properties . . . . . . . . .
7.1.2. Dynamic Characteristics . . . . .
7.2. Circuit Description Interpretation . . . .
7.2.1. Verilog Gate-Level Netlist Import
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
97
. 97
. 97
. 99
. 101
. 101
Contents
7.3.
7.4.
7.5.
7.6.
7.2.2. Path Categorization . . . . . . .
Pattern Based Simulation . . . . . . . .
7.3.1. Software Implementation . . . . .
7.3.2. Simulation Results . . . . . . . .
Complex Module Modeling . . . . . . . .
7.4.1. Partitions Initialization . . . . . .
7.4.2. Current Profile Determination . .
7.4.3. Simulation Results . . . . . . . .
Profile Post-Processing . . . . . . . . . .
Chip-Level Current Consumption Models
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
104
104
105
107
107
108
108
110
113
8. Conclusion and Outlook
115
8.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A. Fourier Transform Characteristics
A.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2. Fourier Transform Properties . . . . . . . . . . . . . . . . . . . . .
A.3. Transform Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117
117
118
119
1. Introduction
Today’s electronic equipment has to meet demanding requirements in terms of complexity and reliability, but also in terms of electromagnetic compatibility, which is
especially tough in the automotive and avionic branches [1]. Electronic control
units manage or support many critical functions such as steering and braking, but
are also responsible for the passenger’s comfort and entertainment. It is fundamentally important to ensure a reliable functionality, but each component also must
not affect the operation of other devices. Systems are therefore usually verified by
measurements, and certified subsequently.
As the costs of fixing defects or compatibility issues significantly increase with
the progress in the development and fabrication process, simulations of the system
behavior are fundamentally important tasks within the design flow of electronic
devices. Hence, simulation models of the respective components are required to
identify potential issues, and are consequently most effective in early design phases.
1.1. Motivation
Besides switching power transistors, complex digital integrated circuits have been
identified as one of the main contributors to system-level interferences. Since most
of the digital devices are typically triggered by a system clock signal, the internal
functions of such devices are almost simultaneously active, and possibly cause significant current peaks at equidistant points in time. As the power supply lines of
a system always have a certain impedance, particularly high current peaks cause
temporary drops of the supply voltage, and may consequently lead to system malfunctions. The dimensions of the on-chip power supply network, but also the design
of the printed circuit board, which is for instance used to interconnect the components of a control unit, is of great importance. Given that voltage drops may
possibly affect the power integrity of other components on the board, but also
cause electromagnetic compatibility (EMC) issues, adequate measures to stabilize
the power supply system have to be established.
Measures to reduce the noise on the supply lines can be implemented on boardlevel, but can also be integrated into the particular components [2]. With regard
to avoid redesign cycles, it is beneficial to know the characteristics of the noise as
early as possible, but at least before manufacturing. Hence, accurate simulation
models are needed to design economic and efficient measures to ensure a reliable
2
1. Introduction
Passive
Active
Elements
Elements
Figure 1.1.: Basic structure of an integrated circuit model.
operation of the particular devices and consequently the entire system. Integrated
circuit emission models are most commonly used to model the dynamic behavior
of semiconductor devices [3], which basically consist of the components shown in
Figure 1.1. While the passive model elements are a kind of impedance model of
the on-chip wiring and the package characteristics, the active elements are current
sources, representing the current consumption caused by the internal activity of the
respective circuit.
Complex devices possibly require different supply voltages, where each of them
may in addition need to be applied at several pins. Emission models are in such
cases also more complex, and typically consist of several active and passive elements to consider the behavior at the respective pins, as well as potential coupling
effects. Such models are usually provided by the semiconductor manufacturer, and
enable component-level simulations to determine eventually required noise reducing
measures at the printed circuit board [4]. But emission models can also be used to
verify the conformance to the requirements at chip-level as a sign-off criteria before
manufacturing. The passive model element parameters are usually extracted from
the layout data, while the current sources are determined by simulations of the
respective on-chip functional blocks.
Given that measures to eliminate the generated noise are most efficient when they
are located close to the source, it is beneficial to separately analyze the behavior
of the particular on-chip modules such as the processor core, and the peripheral
controller units of a microcontroller. As the implementation of properly matched
measures becomes more and more expensive with the phases in the design flow, it
is important to have methods available that are applicable for a prediction of the
generated noise characteristics in early design phases.
1.2. State of the Art
The complexity of integrated circuits significantly increased over the recent years.
As the dynamic behavior and interference with other system components became
1.2. State of the Art
3
an important issue, standards for the electromagnetic compatibility (EMC) modeling have been introduced [5]. Particularly important in this context are the I/O
Buffer Information Specification (IBIS) [6] and the Integrated Circuit Electrical
Model (ICEM) [7] standard. IBIS models are generally accepted for signal integrity analysis, but ICEMs are more comprehensive in terms of the representation
of the generated noise and its propagation over the power supply lines.
ICEMs are primarily used to provide behavioral models of ICs for efficient simulations at board- or system-level, but are also intended to be quite feasible for
sign-off verifications of an IC before manufacturing. Depending on the intended
use, a model can be more or less complex. While the models provided to a customer are typically simplified, the models for chip-level simulations usually include
several design-specific details. As mentioned before, the implementation of noise reducing measures is most efficient at early design phases. Hence, there are primarily
two approaches to model the chip behavior:
Layout-based models are generated by extracting the respective parameters for
the passive model elements from the layout data. There are already commercial tools, such as for instance XcitePI [8], which are capable to reasonably
predict the behavior of a chip. The parameters of such models are usually
aligned to measurement results of the final product, prior to the generation of
adapted ICEMs for system-level simulations. But these models nevertheless
can be considered as reasonably accurate for sign-off verification simulations
before manufacturing.
Design studies are intended to predict the dynamic behavior of a device preferably
early in the design process. Due to the late availability of the chip layout,
design studies are often based on empirical values. Especially the passive
ICEM elements consequently represent the expected wiring properties of the
respectively chosen design option. Tools that are supporting such approaches
are primarily subject of research works [9, 10] or proprietary solutions of IC
manufacturers, such as EXPO [11] developed by Infineon Technologies.
The passive model parameters can therefore be reasonably extracted or predicted
from the available design information. In case of the active elements for the generated noise, the above mentioned tools for early design studies are based on different
approaches. The software introduced in [9] requires for instance a set of parameters
describing the circuit properties in terms of the gate count, the number of supply
voltage pins, the clock frequency, the gate activity percentage, and the chip size.
The noise models are therefore determined by an estimation of the current waveform based on several assumptions, and consequently not directly related to the
actual circuit design data. Other approaches are in turn based on a replacement
of the cell instances with parameterized behavioral models in conjunction with a
4
1. Introduction
transformation of the circuit structure [12]. On the other hand, the approach that
is introduced in [10] basically relies on functional simulations of the circuit and
extracts the particular switching activities. The noise sources are in this case subsequently generated on the basis of such switching event lists and a library providing
the respective switching event noise currents.
1.3. Goals of this Work
At the time this thesis was started, there already existed some approaches to model
the on-chip noise sources. Each of them has its benefits, but also drawbacks. Estimations of the current waveforms based on several parameters, but not related to
the actual design, allow early design studies, but with numerous uncertainties. On
the other hand, models that are based on functional simulations promise accurate
results, but require a detailed knowledge of the circuit functionality and a probably
high simulation effort. In this case, there are probably also limitations in terms
of the flexibility to compare different design options and to predict the effects of
potential noise reducing measures.
Hence, the primary goal of this work is to find methods that are capable to
efficiently model the switching currents of digital integrated circuits, which are
based on the actual circuit design, but with the ability to predict the effects of
possibly advantageous design variations. The model generation is also desired to
be highly efficient in terms of the computational effort, and to be able to investigate
several design options within a short time. A kind of high-level model generation
has therefore been taken into account, which is consequently based on gate-level
circuit descriptions.
As digital gate-level standard cell libraries typically do not provide any analog
properties, i.e. the switching current waveform characteristics and timing models, a
characterization of such a library has to be performed additionally. At the time this
thesis was started, integrated circuits for automotive applications were commonly
fabricated in 130 nm technologies. The characterization methods therefore need to
be feasible for this technology, but should also be applicable for 90 nm structures.
Furthermore, since cell interconnect wires have a significant effect on the transient
behavior of the particular on-chip devices, an approach to consider these parasitic
effects before the chip layout is available, is to be introduced as well.
1.4. Organization of the Thesis
Chapter 2 briefly recapitulates the basics of digital integrated circuits. The structures and characteristics of the most important on-chip devices, as well as the parasitic effects of interconnect wires and power distribution networks, are introduced.
1.4. Organization of the Thesis
5
With regard to determine the switching activities of digital circuits, Chapter 3
discusses the most important characteristics of digital systems, such as the basic
principles of synchronous sequential designs, the system clock generation methods,
the structures of on-chip clock signal distribution networks, and clock gating considerations. As it is important for an efficient standard cell library characterization,
the most important properties of clocked storage elements and combinational logic
cells are introduced as well. The methodology and considerations for the characterization of the standard cell library are discussed in Chapter 4. Methods based on
the pre-characterized library which allow to efficiently model the current consumption of digital modules, and approaches for an interpretation of randomly assigned
activities, are introduced in Chapter 5. Given that at least the effects of the cell
interconnect wires need to be considered within the current source models, Chapter 6 discusses a method to post-process the profiles determined for ideal conditions
(i.e. a constant power supply voltage and lossless interconnect wires) to approximate the respective parasitic effects. Chapter 7 presents the considerations for the
implemented tool, which is capable to generate current consumption models based
on the methods introduced in the chapters before. Furthermore, a verification of
the results for circuits with different complexities and characteristics is provided.
A conclusion and outlook is finally given in Chapter 8.
2. Digital Integrated Circuit Basics
Most of today’s digital integrated circuits are realized in Metal Oxide Semiconductor (MOS) technologies. Logic functions are predominantly provided by standard
cells, which are implemented as Complementary MOS (CMOS) devices. This chapter discusses the structure and behavior of primary devices as well as the effects of
interconnect wires and power distribution networks.
2.1. MOS Transistor
The basic device of integrated circuits is the MOS transistor. As it performs essentially as a switch, it is predestinated to implement digital functions. In CMOS
technology, two types of transistors are present. While the n-type (or NMOS) transistor needs a positive control voltage to become conductive, the p-type (or PMOS)
transistor is switched on when low voltages are applied to the input.
Figure 2.1 shows common schematic symbols for both types. The conductivity
between drain (D) and source (S) is controlled by the applied voltage at the gate
(G) terminal. As the bulk (B) terminal is usually tied to the source potential in
digital VLSI designs, MOS transistors are therefore often shown as three-terminal
devices (Figure 2.1(b) and (d)).
D
G
G
G
G
G
G
G
G
D
D
D
B
G
G
B
B
S
S
(a) NMOS transistor
SD device
SD
as a 4-terminal
G
G
D
B
B
B
S
D
D
D
D
S
S
(b) NMOS transistor
SD
SD device
as a 3-terminal
B
G
G
B
S
G
G
S
S
(c) PMOS transistor
as a 4-terminal device
D
D
D
S
S
S
S
(d) PMOS transistor
as a 3-terminal device
Figure 2.1.: Common symbols for MOS transistors.
p‐substrate
8
2. Digital Integrated Circuit Basics
gate
source
oxide
drain
tox
n+
n‐channel
L
n+
p‐substrate
bulk
Figure 2.2.: MOS transistor structure.
2.1.1. Structure and Operation
The structure of NMOS transistors is schematically shown in Figure 2.2. It consists
of two heavily doped n+ regions, embedded in a lightly doped p-type substrate. The
gate electrode is situated between them on the substrate, separated by a silicondioxide layer. PMOS transistors have a similar structure. The drain and source
regions are in this case heavily doped p+ regions, embedded in an n-type substrate.
Given that transistors are typically symmetric, the source and drain terminals are
in this case interchangeable. The final distinction between NMOS and PMOS
transistors is consequently given by the applied signals and voltages. As the charge
carriers move from source to drain, NMOS and PMOS transistors have a different
polarity. While electrons are the carriers in NMOS, holes carry the current in
PMOS transistors. Hence, the PMOS source is the terminal connected towards the
positive supply voltage, while the more-positive node at the NMOS device is defined
as the drain. Important dimensions for the following discussion of the operating
modes are the channel length L, the channel width W , and the oxide thickness tox .
Due to the similar operating principles of n-type and p-type transistors, the
following discussion is concentrated on NMOS devices. All principles and formulas
are also valid for PMOS devices by inverting the signs of voltages and currents.
If all transistor terminals are tied to ground, the path from drain to source shows
two pn+ junction diodes connected back to back, with the substrate as a common
p-region. Under this condition, both junctions can be considered as ”off”, which
results in an extremely high resistance between drain and source. The gate and substrate form the plates of a capacitor with the gate oxide as the dielectric. Applying
a positive voltage to the gate causes an accumulation of negative charges under the
gate. If the gate voltage is sufficiently high, the accumulation of electrons under
the gate leads to an inversion of the p-type substrate to an n-type channel in this
gate
2.1. MOS Transistor
source
drain
n+
n+
9
p‐substrate
gate
gate
source
drain
n+
source
gate
source n+
p‐substrate
n+
n+
gate
(a) subthreshold:
VGS < VT
source
drain
n+
source
p‐substrate
n+
p‐substrate
p‐substrate
gate
source
gate
drain
n+ drain
n+
n+
p‐substrate
p‐substrate
(c) drain
saturation: VDS ≥ VGS
− VT
source
n+
n+
(b) resistive: VDS < VGS − VT
n+
gate
source
drain
n+ drain
n+
gate
drain
n+
n+
Figure 2.3.: MOS transistor operating regions.
p‐substrate
p‐substrate
gate
region
between drain and source.
Provided that a positive voltage is applied at the
source
drain
drain terminal,
this conductingn+channel allows a current flow from drain to source.
n+
The required gate-voltage to form the conducting channel is termed threshold voltage (VT ). The p‐substrate
analytical derivation of VT , as well as the formulas describing the
operating behavior, is extensively done in the respective literature [13, 14, 15, 16].
For the discussion of the transistor operation, some parameters are defined. One
of the most important parameters for the operating characteristics is the gate-oxide
capacitance per unit Cox , defined as
Cox =
εox
,
tox
(2.1)
where εox is the permittivity of the oxide and tox is the oxide thickness. The process
transconductance k 0 depends on the carrier mobility µn and is given by:
k 0 = µn Cox =
µn εox
.
tox
(2.2)
The operation of MOS transistors is typically subdivided into three regions, which
are discussed in the following sections1 and shown in Figure 2.3.
Subthreshold Region
If VGS < VT , no conducting channel is formed (Figure 2.3(a)). The transistor can be
considered as switched off in this region, and the drain-source current IDS is therefore very small, but not zero. Compared to the current flow through a conductive
1
Even though the discussed devices and circuits are designed in a deep submicron technology,
this chapter is intended to introduce the basic operating behavior and is therefore founded on
the characteristics of long-channel devices.
10
2. Digital Integrated Circuit Basics
channel, it is small enough to be neglected in many cases, but becomes more and
more important for low-power applications in deep submicron technologies.
Resistive Region
Applying a sufficiently high voltage between gate and source (VGS > VT ) forms the
previously mentioned conductive channel (Figure 2.3(b)). A small voltage difference
between drain and source (VDS ) causes a current ID to flow from drain to source.
Due to the applied voltage VDS , the gate-to-channel voltage V (x) – at a point x
along the channel – decreases from source to drain. The transistor is in the resistive
region, when the assumption VGS − V (x) > VT is given all along the channel. The
voltage-current relation in this case is given by:
IDS = k
0W
L
(VGS − VT ) VDS
2
VDS
−
.
2
(2.3a)
Substituting the device transconductance parameter (also termed gain factor) k =
k 0 (W/L) results in:
IDS = k (VGS − VT ) VDS
V2
− DS
2
(2.3b)
Given that for small values of VDS the quadratic factor in Equation 2.3 can be
ignored, the voltage-current dependencies are linear. This leads to the term resistive
or linear region.
Saturation Region
Increasing the drain-source voltage comes along with a decrease of the channel
charge near the drain. If the condition VDS ≥ VGS − VT is met, the induced charge
at the drain is zero and the conducting channel is pinched off (Figure 2.3(c)). In
this case, the transistor is in the saturation region, and the voltage over the effective channel (from the source to the pinch-off point) remains fixed at VGS − VT .
Therefore, also the current IDS is saturated, but due to the channel-length modulation (effective channel length depends on the drain-source voltage), IDS depends
nonetheless on VDS . This effect is considered by the channel length modulation
parameter λ, and the drain-source current in this region is given by:
IDS =
k
(VGS − VT )2 (1 + λVDS ) .
2
(2.4)
2.1. MOS Transistor
11
gate
CGSO
source
CGC
n+
CGDO
drain
n+
Cj
p‐substrate
Cj
Figure 2.4.: MOS transistor capacitances.
2.1.2. MOS Transistor Capacitances
The performance of digital circuits is primarily given by the time it takes to
charge/discharge the intrinsic capacitances of the MOS transistors. Besides the
gate-capacitance, a transistor has several additional parasitic capacitances given
by the structure. Most of them are voltage-dependent and additionally nonlinear.
Figure 2.4 gives an overview of the primary capacitances discussed in the following
sections.
Overlap Capacitance
The gate electrode and the gate oxide have exactly the dimensions of the channel
in an ideal case. Due to the lateral diffusion, as the oxide overlaps the drain and
source regions by xd , an overlap capacitance is formed. It is linear and given by:
CGSO = CGDO = Cox xd W = Co W
(2.5)
where Cox is the capacitance per unit area (Eq. 2.1), and W is the channel width.
Channel Capacitance
The most significant parasitic element of a MOS transistor is the nonlinear and
voltage-dependent gate-to-channel capacitance CGC . It is divided into three components: CGCS (gate-to-source), CGCD (gate-to-drain), and CGCB (gate-to bulk).
As the distribution of the total capacitance depends on the operating region, a
separate consideration for each of these regions is done.
cut-off: If the gate-voltage is lower than the threshold, no channel exists. The
total capacitance CGC is between gate and bulk.
resistive: In this region, where the channel is present over the full distance between
source and drain, CGC is equally distributed between CGCS and CGCD . As
the channel shields the bulk from the gate, CGCB is zero in this case.
12
2. Digital Integrated Circuit Basics
saturation: The pinched-off channel in this region leads to a negligible CGCB and
CGCD , and most of the capacitance is therefore between gate and source. The
value of CGCS in the saturation region is 23 Cox W L.
An overview of the distribution of CGC to its components is shown in Table 2.1.
Given that a transition from one region to another is a continuous process, an
entirely correct expression of the capacitance-distribution is more comprehensive.
But the most important fact for the following chapters is, that all capacitances are
proportional to W L.
Table 2.1.: Channel capacitance in the different operating regions.
region
CGCB
cut-off
Cox W L
resistive
0
saturation
0
CGCS
CGCD
CGC
0
0
1
C WL
2 ox
2
C WL
3 ox
1
C WL
2 ox
Cox W L
Cox W L
2
C WL
3 ox
0
Junction Capacitances
Each of the reverse-biased pn-junctions at the source and drain regions form a
junction capacitance Cj , which can be divided into the two following components:
bottom plate: The capacitance contributed by the area at the bottom of the junction is given by Cbottom = Cj0 W Ls , where Cj0 is the junction capacitance per
unit area and Ls is the length of the junction-sidewall.
0
sidewalls: The sidewall-capacitance is given by Csw = Cjsw
xj (2Ls + W ), where the
junction-to-channel capacitance is neglected, and the width W is therefore
considered only once. Given that the sidewall-height xj is a technology pa0
rameter, it can be combined with Cjsw
to the capacitance per unit perimeter
0
Cjsw = Cjsw xj .
As a result, the expression respecting both contributions is:
Cj = Cbottom + Csw = Cj0 Ls W + Cjsw (2Ls + W )
(2.6)
2.2. CMOS Devices
Complementary MOS devices typically consist of both PMOS and NMOS transistors. The most basic CMOS device is the inverter shown in Figure 2.5, which
consists of one transistor of each type. The input voltage Vin is applied at both
2.2. CMOS Devices
13
1.5
VDD
1.25
Vout [V]
1
Vin
Vout
0.75
0.5
0.25
0
VSS
(a) internal circuit
0
0.25
0.5
0.75
Vin [V]
1
1.25
1.5
(b) voltage-transfer characteristic
Figure 2.5.: CMOS inverter schematic and voltage-transfer characteristic.
gates, and the output voltage Vout is present at the interconnected drain terminals.
Low input voltages force the PMOS transistor to become conductive, while the
NMOS transistor is high-resistive in this case. The output voltage Vout is therefore
pulled up towards VDD . On the other hand, high input voltages force the NMOS
transistor to be conductive, while the PMOS transistor is high-resistive, and the
output voltage is tied towards VSS . As abrupt switching from one state to the other
is realistically impossible, a continuous voltage transfer characteristic of an exemplary device is plotted in Figure 2.5(b). Due to the fact that the output voltage
is either pulled towards VDD or VSS , the voltage swing of Vin and Vout is virtually
equal to the supply voltage.
The operation of an inverter shows the general principle of logic gates; one transistor type is used to pull a node down towards VSS , and the other one towards
VDD . Often used terms in this context are pull-up network (PUN) and pull-down
network (PDN). Figure 2.6 shows the transistor-level schematic of the most basic
combinational CMOS devices with at least two inputs: the NAND and NOR gates.
It is shown that the internal circuit is directly equivalent to the gate-function.
But due to the inverting characteristic, as a high input voltage primarily forces
an NMOS transistor to pull a node down to a lower voltage level, CMOS devices
are typically also complementary regarding its internal transistor-level circuits. To
exclude states where both PUN and PDN are actively pulling a node towards opposite voltage levels, a general rule is that parallel PUN transistors are connected
in series at the PDN and vice versa2 .
2
This rule is valid for the basic combinational functions, but there are gates implementing more
complex functions. Due to optimizations, different structures are possible in this case.
14
2. Digital Integrated Circuit Basics
VDD
VDD
A
PUN
PUN
Z
B
A
Z
PDN
PDN
B
VSS
VSS
(a) NAND
(b) NOR
Figure 2.6.: Schematic of CMOS NAND (a) and NOR (b) gates with two inputs.
The output Z of the shown NAND gate is for instance low when both inputs
(A and B) are high, since in this case all NMOS transistors are conductive and
all PMOS transistors are high resistive. All other input states lead to the output
state high, since at least one low input voltage causes a PMOS transistor to become
conductive and the output is pulled towards VDD . At least one NMOS transistor
is simultaneously switched off, leading to a high resistance between VZ and VSS . A
NOR gate is the direct opposite in this aspect, consisting of a p-type transistor for
each input in series and as many parallel n-type ones.
2.2.1. Static Behavior
As mentioned before, all input and output signal voltages are virtually equal to
VDD or VSS in the steady states. The potential difference, due to the channelresistance of conductive transistors, can be neglected in this case. Consequently,
all transistors are either switched ”on” or ”off”. The current consumption in this
state is very small, but not zero. It can be neglected in many cases, but becomes
more and more important for deep submicron technologies, as technology scaling
leads to significantly increasing leakage currents [17].
This static current flow is a consequence of the subthreshold conductance and the
gate leakage. Figure 2.7 shows for instance three consecutive inverters, where the
static behavior of the embedded one (Pcell , Ncell ) is analyzed. At a first glance, a
high-resistive transistor allows, due to the subthreshold conductance, a current flow
2.2. CMOS Devices
15
VDD
VDD
Pdrv
VDD
Pcell
Vin
Pout
Vout
Iin
Ndrv
VSS
Iout
Ncell
VSS
Nout
VSS
Figure 2.7.: Transistor-level schematic of three consecutive inverters.
at the direct path from VDD to VSS . Depending on the input state, the amount
of current is given by the off-resistance of the corresponding PMOS or NMOS
transistor. On the other hand, the gate leakage causes a current flow from VDD
over the PMOS of one cell and the NMOS transistor of a thereon connected cell.
Provided that a low voltage is applied at the input of the driving inverter, Vin is
pulled to a high potential, and the transistors Pdrv and Ncell are conductive. The
oxide leakage of Ncell permits a current flow in the direction of Iin in this case.
As the output voltage Vout is low, Pout is also switched on. This causes an oxide
leakage current in the opposite direction of Iout . Since gate oxide resistances are
negligible compared to the on-resistance of a driving transistor, it can be considered
as independent from the driving cell strength. It is consequently given by the gate
characteristics of the transistors connected to the cell input.
As a result the static current consumption of a circuit cell can be determined
by the analysis of the particular cell characteristics. The effects of the actually
connected cell characteristics in a system are insignificant.
2.2.2. Transient Characteristics
In addition to the static current flow, the overall current consumption of CMOS
devices is mainly given by the dynamic switching currents. This dynamic part
is caused by any circuit activity and primarily consists of the required amount of
current for charging/discharging the parasitic capacitances, as well as a certain cross
current. Given that during a transition period there exists a short time interval
where the complementary transistors are weak resistive, this type of current can
flow through a device on the direct path from VDD to VSS . But the most important
16
2. Digital Integrated Circuit Basics
part is the current caused by capacitive loads. Due to the limited carrier velocity, it
takes some time to charge/discharge the parasitic elements, which limits the signal
slew rates. This consequently leads to signal propagation delays from the inputs
to the outputs of cells, and results in a limited system performance, given by the
technology characteristics.
Vin
VH
90%
50%
VL
10%
tr
tf
tpLH
t
tpHL
Vout
VH
VL
tHL
tLH
t
Figure 2.8.: Timing parameter definitions.
Common definitions of the timing parameters are illustrated in Figure 2.8. It
shows an input and output voltage waveform at a gate, where a rising edge at the
input causes a falling edge at the output. As mentioned before, the steady state
voltages are not exactly the supply voltages, and therefore labeled VL and VH . The
signal rise and fall times are typically defined between the 10% and the 90% points
of the total voltage swing. As the values are different for Vin and Vout , they are
termed tr and tf for input transitions, and tLH (low-high) and tHL (high-low) at
the output. Signal rise/fall times largely depend on the strength of the driving gate
and the load presented to it. The output response times, or propagation delays,
for low-high (tpLH ) and high-low (tpHL ) input events are defined between the 50%
transition points of the input and output voltage swing.
2.3. Cell-Based Design Methodology
17
Another important characteristic parameter of gates is the propagation delay tp .
Because the response times for rising and falling input events are possibly different,
it is defined as the average of them:
tp =
tpLH + tpHL
2
(2.7)
2.2.3. Power and Energy Consumption
Important properties of a system, such as the power supply capacity, the battery
lifetime, the chip internal power distribution line dimensions, packaging and cooling
requirements depend on the power and energy consumption. The peak power Ppeak
is for instance important for the supply line dimensions, while the average power
dissipation Pav is decisive for cooling requirements or the battery lifetime of mobile
devices. The definitions are
Ppeak = ipeak Vpeak = max[p(t)]
Pav
1
=
T
Z
0
T
Vsupply
p(t)dt =
T
Z
T
isupply (t)dt
(2.8)
0
where p(t) is the instantaneous power, isupply is the current from the supply with the
voltage Vsupply over the interval t ∈ [0, T ], and ipeak is the maximum of isupply . The
components of this supply current are the previously mentioned static and dynamic
currents. While the static one is permanently present, even when the system is
inactive, the dynamic component is proportional to the switching activity and the
system clock frequency.
2.3. Cell-Based Design Methodology
There are several different methods to design a circuit, but the cell-based design
methodology is established for the development of the majority of today’s digital
integrated circuits [18]. Given that this method enables a high grade of automation
and tool support, short design cycles and moderate design costs can be achieved. A
cell-based design is basically implemented by instantiating building blocks (cells),
where each of them provides a particular functionality[19]. These cells are typically
connected by wires in several layers, where the number of them depends on the
manufacturing process. This design methodology is also called semi-custom, as
the number and type of available building blocks is given by a library, and only
the composition of the system, by instantiating these given cells, is customary
done. Systems such as microcontrollers or other complex systems typically consist
of standard cells and macro cells, which are discussed in the following sections.
18
2. Digital Integrated Circuit Basics
(a) inverter
(b) strong inverter
(c) 2-input nand
Figure 2.9.: Internal structure of exemplary standard cells.
2.3.1. Standard Cells
As the basic devices of cell-based designs, standard cells represent implementations
of the elementary functionalities needed for assembling a digital circuit. This can be
a basic boolean logic function (e.g. AND, OR, XOR, inverter), an arithmetic function (e.g. half- or full-adder), a storage element (e.g. latch, flip-flop), but possibly
an even more complex encoder, decoder, comparator, or multiplexer as well.
Standard cells are typically organized in a library, where each cell type (implemented functionality) is usually provided with different fan-in and fan-out characteristics. This is important for design optimization reasons, since gates with larger
transistor dimensions enhance the system performance by higher signal slew rates
and therefore shorter propagation delays. But on the other hand, as chip area is a
significant cost factor, it is under certain conditions beneficial to instantiate cells
with smaller driver strengths and accept longer propagation delays for noncritical
signals. In addition to a possibly smaller chip area, this reduces also the power and
energy consumption.
Such a limited library of cells enables the design of a circuit using a high-level
hardware description language. A synthesis tool uses the descriptions in the celllibrary to transform this high-level circuit-description into a technology-dependent
netlist [20]. This netlist contains basically a list of the instantiated components
and the information concerning the interconnections of the particular cell ports.
Based on this netlist, placing the cells and routing the wires on the chip can be
also generated automatically by an appropriate tool. Figure 2.9 shows the internal
2.3. Cell-Based Design Methodology
19
transistor-level structure of two inverter cells with different driver strengths, and
a NAND gate with two inputs. Important is that the cells in one library typically
have a constant size in at least one dimension, which allows to line them up in rows
on the chip. This reference size is usually the cell height, while the width varies
according to the complexity of the cell function and the driver strength.
Given that VDD and VSS are located at the top and bottom of the cell, this allows
continuous power supply rails on the chip, while the transistors are located side by
side and relatively short interconnections are possible. The cell layouts also show
that the port connectors are typically located centrally, since each port is usually
connected to both NMOS and PMOS transistors.
2.3.2. Macro Cells
The complexity of integrated circuits is permanently increasing. To preferably reduce the development effort and time of new systems, it becomes more and more
important to reuse reasonably large blocks of existing systems. Such reusable components are called macro cells, which usually provide a significantly increased complexity and functionality than the cells in a typical standard cell library. Common
examples for macro cells are multipliers, memories, data paths, and even complete
microprocessor (µP) cores or digital signal processor (DSP) entities.
As macro cells have a well-defined functionality, the internal structure is typically
similar at each cell instance. In this case, it is often beneficial to additionally specify
the physical design in terms of transistor locations and wiring. The advantage of
such a hard macro is the possibility of an economic optimization, as it has to be
developed only once, while the benefits are available at each instance. Important
properties are in this context the predictable performance and power consumption.
Possible disadvantages are the predefined dimensions on the chip and the fact, that
there are no options for customizations.
One special case are cells with an almost regular internal structure, such as memory modules. These types of macros can be possibly generated with slightly different
properties from parameterized models. Using a specialized module compiler enables
the adaptation of the internal structure and layout of regularly structured cells to
meet the particular requirements of different applications.
Macros without any physical design implementations are called soft macro cells,
where only the functional description is defined. This can be a structural description
in the form of a gate-level standard cell netlist, but also a high-level description
in a hardware description language, such as VHDL or Verilog [21]. High-level
descriptions of soft macros are typically exceedingly parameterized to fulfill the
requirements of various applications. This kind of soft macro is also called an
intellectual property module and often provided by third party vendors.
20
2. Digital Integrated Circuit Basics
2.4. Deep Submicron Interconnects
Technology shrinking significantly reduces the structures of a chip. This allows
more and more complex circuits at a given chip area, but also reduces the crosssections of interconnect wires. On the other hand, the wire lengths increase with
the system complexity of a chip. As a result, the characteristics of cell interconnect
wires cause a significant effect on the performance of a system in deep submicron
technologies. The most important parameters, as well as commonly used models
for cell interconnects, are introduced in the following sections. A detailed discussion
of interconnect wire effects can also be found in [15, 19].
2.4.1. Interconnect Parameters
Wire Resistance
Shrinking the dimensions of the on-chip structures reduces at least the widths of the
interconnect wires. As the wire lengths are probably increased due to the possibly
more complex circuits, the wire resistance became an important parameter. This
resistance R is given by, and often also derived to
ρL
L
ρL
=
= Rsq
(2.9)
A
TW
W
where ρ is the resistivity of the material in Ωcm, L is the length, and A is the area
of the wire cross-section. As ρ is given by the material and T is almost constant for
a given technology, the sheet resistance Rsq = Tρ is an important parameter. The
L
, also
wire resistance is therefore given by a multiplication of Rsq with the ratio W
referred to as the number of squares of a wire.
R=
Coupling Capacitances
As also mentioned in the previous sections, capacitances are one of the primary
parameters effecting the performance of a system. Compared to the parasitic MOS
transistor capacitances, the effects caused by cell interconnect wires are considerably increased due to submicron effects.
Due to the potentially complex three-dimensional wire structure of state-of-theart integrated circuits, an accurate modeling of the wire capacitances is a nontrivial
task. As the spacing between wires has been decreased, coupling effects between the
wires have become significant. The most important components of the total wire
capacitance are shown in Figure 2.10. As a consequence of the typically overlapping
wires on different layers, this area capacitance CA , located between different layers,
is given by
WL
CA = εox
(2.10)
H
2.4. Deep Submicron Interconnects
F
21
A
F
L
L
F
F
A
Figure 2.10.: Components of the total wire capacitance.
The capacitance between lines in the same layer are called the lateral capacitance
CL and given by
TL
(2.11)
CL = εox
S
Due to the reduced wire spacing S in deep submicron technologies, this component
became the main coupling capacitance, and is therefore significantly responsible
for delay and noise issues. Minor significant, but possibly also responsible for
coupling effects are the fringing capacitances CF . Actually present at any edges
and surfaces, only the capacitances to the neighbor layers are shown in the figure,
as CL typically dominates when the spacing between the wires is sufficiently small.
CF is approximately given by
TL
(2.12a)
CF a = εox ln 1 +
H
or alternatively for widely spaced wires located between neighbor wires at one layer:
WL
CF l = εox ln 1 +
(2.12b)
S
The total wire capacitance C can finally be summed up to
C = 2CA + 2CL + 2CF a + 2CF l
(2.13)
where some of the components may be neglected depending on the actual wire
spacing.
22
2. Digital Integrated Circuit Basics
Interconnect Inductance
In many cases negligible, but as a consequence of low-resistive interconnect materials and increased switching frequencies, the interconnect inductance starts to play
a role. As increased switching frequencies lead to higher variations of the current
within short time periods, inductances possibly generate a considerable voltage
drop
di
∆V = L
(2.14)
dt
The wire inductance can be determined directly from its geometry and environment, but also by the relation of the capacitance c and inductance l (per unit
length) of a wire by the expression
cl = εµ
(2.15)
where ε is the permittivity and µ is the permeability of the dielectric.
2.4.2. Wire Models
The characteristics of interconnect wires can be considered as ideal in early design
phases where only the functional properties of a circuit are of interest. Also very
short wires at transistor level, or side by side located cells, usually cause no relevant
effects on the system behavior. Interconnect wires over relevant distances possibly
cause significant effects on the switching performance of the driving transistors.
Wire models are therefore necessary for a reasonable circuit analysis, but also during
the design phase to determine appropriate measures for optimization and reliability
issues. Accurate wire models, even including all the minor effects, would result in
too complex models. Therefore, different models with a reasonable complexity are
used to approximate the real interconnect behavior.
Lumped Models
Even though the previously mentioned parasitics are actually distributed along the
wire, it is often feasible to lump them into a few elements. Commonly used lumped
models are shown in Figure 2.11.
The most basic model consists of one capacitance only, shown in Figure 2.11(a).
It introduces the additional load on the driving cell and is applicable as long as
the resistive component is insignificant and the switching frequencies are in the low
and medium range. This is true for usually more than 90% of wires in a chip, and
can be modeled by a single lumped capacitor.
For wires of a certain length, the resistive component becomes significant. In this
case, the model requires at least one resistor to appropriately consider the introduced RC effects. The L-model shown in Figure 2.11(b) is simple, but pessimistic
wire
wire
wire
wire
wire
2.4. Deep Submicron Interconnects
23
wire
wire
wire
wire
wire
wire
wire
wire
wire wire
wire
wirewire
(a) C only model
wire
(b) L-model
wire
wire
wire
wire
wire
wire
(c) T-model
wire
wire
wire
wire
wire
(d) Π-model
wire
Figure 2.11.: Commonly used lumped wire models.
wire
in terms of the time constant τ = RC. As this value should rather
be in the range
3
of τ = RC/2, this model is consequently inaccurate for long wires . In this context
the T - and Π-models, where the resistance or the capacitance are divided into two
elements are more accurate. Both models are suitable for calculations, but as the
T -model has an additional node, which may increase the number of calculations,
the Π-model is the most popular lumped RC model for long interconnect wires.
Distributed Models
Long interconnect wires show significant coupling effects, but lumped models do
not take them properly into account. Distributed models lead to more accurate
results in this case, but are on the other hand also more complex.
Figure 2.12 shows a simplified model, where two nets are coupled by a capacitance
to demonstrate the importance of considering coupling effects. In case the aggressor
net is not switching, the coupling capacitance Cc could be considered as connected
to ground. The loading capacitance CL in this case is
CL = Cgnd + Cc .
(2.16)
On the other hand, if both nets are switching in the same direction, then
CL = Cgnd
(2.17)
as no change of the coupling capacitance charge is caused. But if both nets are
3
A derivation and discussion of the time constant τ using the Elmore delay is extensively done
in the respective literature, such as[15].
w
wire
24
2. Digital Integrated Circuit Basics
c
gnd
gnd
Figure 2.12.: Internal structure of exemplary standard cells.
switching in the opposite direction, the voltage at Cc is reversed and has to be
considered twice:
CL = Cgnd + 2Cc
(2.18)
This shows that including the coupling capacitances into a lumped model as connected to ground is only a first order approximation. And as there are typically
many more coupled nets in a circuit, there are also numerous possible switching activity combinations, which can be covered only by distributed models and dedicated
coupling capacitances.
Since interconnects are typically part of a complex structure, the mentioned parameters actually vary along a wire. Given the case where a cell output is connected
to several cells, the wire has a kind of tree structure and is furthermore possibly
routed via different interconnect layers. As a result, the best accuracy would be
achieved by modeling each section separately with the appropriate elements. Since
this would again lead to excessively complex models, a compromise solution between accuracy and model complexity has to be found. This is typically done by
introducing a threshold value for which capacitances should be considered, which
usually results in sufficiently accurate models with a reasonable complexity.
2.5. Power Distribution Networks
The power supply voltage is ideally a constant value all over a chip, but the distribution networks are also a kind of interconnect wire structures. As a consequence of
the parasitic wire effects, the current consumption of the on-chip devices causes certain supply voltage fluctuations over time. Depending on the power grid structure
and dimensions, as well as the circuit layout and the amount of current consumption, these fluctuations are furthermore possibly different at particular chip regions.
The power distribution network design, but also the modeling of its behavior is
therefore a potentially complex task.
2.5. Power Distribution Networks
VDD
25
Lpkg
Decap
VSS
Lpkg
Figure 2.13.: Model of a power supply system considering the package inductance,
the RLC characteristics of the on-chip power distribution network, as
well as an instance of a decoupling capacitance.
Power supply systems are designed to provide a preferably stable voltage at
any point on a chip. But as it is distributed from an external source to all the
particular components on the chip, voltage drops are caused by the parasitic wire
characteristics. Figure 2.13 shows a power system model considering the most
important resistive, capacitive, and inductive components. A popular measure to
reduce voltage drops is the instantiation of decoupling capacitances.
2.5.1. Voltage Drops
The primary reasons of power supply voltage drops are referred to as IR drop and
L di/dt. IR drops are caused by the current flow and the wire resistances and are
most important for the on-chip power grid. Inductive effects are on the other hand
primarily caused by the interconnects at the chip package. The total voltage drop
V is given by
V = IR + L
di
.
dt
(2.19)
IR drop
With an increased complexity of integrated circuits also the current consumption
increases due to additionally switching components. This leads to significant voltage drops caused by the resistance of the power supply lines. Such IR drops also
exist at the ground grid, which are also referred to as ground bounce.
26
2. Digital Integrated Circuit Basics
L di/dt
A large number of simultaneously switching cells, which are in digital designs typically the clock buffers and flip-flop cells, possibly demand high current peaks with
extremely short rise times and therefore a high di/dt. At least the package inductance may then contribute a significant portion to the overall voltage drop. Even
the small inductances in the power grid may cause a considerable voltage drop in
high-speed designs.
The most significant inductances arise from the bonding wires used to connect
the chip I/O pads to the lead frames of a traditional package. The inductances of
a ball-grid array (BGA) package are one order of magnitude lower than the one of
a dual inline package (DIP). But the current consumption of the typically much
more complex systems in a BGA package causes nevertheless a significant L di/dt
value for both VDD and VSS .
2.5.2. Decoupling Capacitances
Large voltage drops in the power distribution system may possibly lead to supply
voltages that temporarily fall below the minimum value required for a proper operation of a system. On-chip decoupling capacitances are commonly used to keep the
supply voltage within the noise budget. These capacitances are typically located
near the power pins of components demanding high peak currents.
Decoupling capacitances hold a certain reservoir of charge. The needed current
for the switching operations of nearby located cells is first delivered by these capacitances. Recharging them for the next operation is done later by the current flow
from the power supply. As a result, decoupling capacitances operate as a kind of filter to reduce excessive di/dt rates, which are known as responsible for a significant
part of the interfering voltage drops.
As discussed in this chapter, there are several parasitic capacitances introduced
by the MOS transistors and the interconnect wires. All of them limit the performance of a system, but all capacitances charged to VDD are effective as decoupling
capacitances. The implementation of dedicated decoupling capacitances is usually
done by NMOS transistors with the gate connected to VDD and the source and
drain connected to VSS .
3. Synchronous Sequential Digital
Systems
The basic principle of digital circuits is that the signals ripple through paths of
consecutively switching devices. To ensure correct values at a given node and time,
the activity of a circuit must be coordinated. A general classification of digital
systems is commonly done between synchronous and asynchronous designs [22, 23].
Synchronous systems are timed by a periodic clock signal. This time reference
is globally distributed to all memory elements, which are intended to update
their values simultaneously. The calculations of the internal values and the
output values are done within the time interval between two synchronization
events.
Asynchronous systems are not stimulated by a globally distributed time reference
signal. An often used term in this context is also self-timed systems. The
operating sequences in such systems are coordinated by completion signals,
which are generated by each function block, when the calculations are done,
and the resulting values are stable. Given that all following operations, which
rely on the results of a previous function, have to wait on this handshake
signal, this ensures the correct logical order of the operating sequences in
asynchronous designs.
As virtually all of today’s digital systems are synchronous designs, this chapter is
focussed on this isochronous type of circuits. In the following sections, the general
principles and the most important design considerations of synchronous sequential
systems are discussed.
3.1. General Principles
Synchronous digital systems basically consist of clocked storage elements, which are
triggered by the system clock, and the combinational logic, which is responsible for
the determination of the internal values but also the resulting output signals. At
any clock event, the storage elements are updated with these internal values, which
consequently triggers the combinational logic again to determine the next results.
This mode of operation is basically similar to the logical model of a finite-state
28
3. Synchronous Sequential Digital Systems
inputs(X)
combinational
logic
outputs(Y)
Y=Y(X,Sn)
clocked storage
elements
present state: Sn
next state Sn+1
Sn+1=f(Sn,X)
clock
Figure 3.1.: General principle of a synchronous design as a finite-state machine.
machine (FSM), as it is illustrated in Figure 3.1. The combinational logic takes the
present state Sn and the input values X into account, and determines the output
values Y and the next state Sn+1 . Therefore, both Sn+1 and Y depend on Sn and
X: Sn+1 = f (Sn , X), Y = Y (Sn , X).
The state transition from Sn to Sn+1 is initiated by the clock signal. This sequential procedure is schematically shown in Figure 3.2. On which event (rising or
falling edge) the state transition is sensitive depends on the type of the instantiated
storage elements. According to the most often implemented option, the circuit in
this example is sensitive on the rising clock edge. It is shown that the combinational logic has a limited time to calculate the next state, or otherwise, it limits
the maximum possible clock frequency. As the paths in the combinational logic
Sn
Xt
comb.
logic
Yt
Sn+1
Xt+1
comb.
logic
Yt+1
Sn+2
clock
time
Figure 3.2.: FSM state transitions related to the clock signal.
3.2. Digital System Clocking
29
block have different lengths, the resulting values are not simultaneously available.
The propagation delay of the longest (slowest) path is termed critical path and
represents a fundamental performance parameter of a system.
A simple FSM model is feasible to demonstrate the basic principles of synchronous sequential systems, but the architecture of modern designs is much more
complex, and discussed in the following sections.
3.2. Digital System Clocking
The clock signal is the time reference for all operations, and therefore an essential
part of a synchronous digital system. On the other hand, the clock distribution
network and the storage elements are responsible for a significant portion of the
energy consumption. The clock subsystem of modern microprocessors possibly
consumes up to 40% of the entire chip power [24].
There are a number of methods to generate the clock signal and distribute it
to the storage elements [25]. Given that the clock subsystem directly affects the
performance of a system, its structure is application-specific and particularly optimized. Therefore, in the course of characterizing the behavior of a system, special
care has to be taken on the characteristics of the implemented clocking methods.
3.2.1. System Clock Generation
The internal system clock is typically derived from an external oscillator. While
some systems are directly driven by the external clock, especially high-performance
microprocessors operate at higher frequencies than the external reference. Figure 3.3 shows the main blocks of a typical clock generation unit. An internal
phase-locked loop (PLL) is commonly used to synchronize an on-chip oscillator to
the preconditioned external reference. By dividing the internal frequency f PLL by
a given factor prior to the synchronization, a supplemental multiplication of the
fCPU
ext. clock
clock
prescaler
fOSC
PLL
fPLL
clock
divider
fSYS
configuration register
Figure 3.3.: Basic blocks of a clock generator unit.
30
3. Synchronous Sequential Digital Systems
reference frequency f OSC is performed. Given that a central processing unit (CPU)
requires typically a higher clock frequency than peripheral units, a configurable
clock divider is used to generate multiple clock signals with different frequencies.
The configuration of the clock generation unit in terms of multiplier/divider factors and prescaler options is typically done by setting the appropriate values in a
configuration register.
For the characterization of a system in form of the transient current consumption,
the configuration of possibly different clock signals is important. Dividing a clock
signal by a given factor k is often done by disabling k periods of the faster signal
between two passed periods. This leads to different ratios of the high- and lowphases of the resulting slower clock, which is commonly termed duty cycle and of
significant importance for the current consumption modeling in Chapter 5.
3.2.2. Clock Distribution
The intention of synchronous designs is that all storage elements are triggered
simultaneously by the global system clock. This requires a specifically designed
distribution network, which assures the same clock signal propagation delay from
the generation unit to all storage elements. Figure 3.4 schematically shows a basic
approach for a balanced distribution network. The clock signal is routed to a central
point on the chip and distributed over a H-tree structure to a number of functional
sub-blocks. The local distribution to the particular storage elements is done by
balanced paths as well. Such approaches are based on the assumption that the
wire lengths from the central point to all leaf nodes are constant.
As complex designs are typically irregular, and different metal layers with different wire dimensions are used, the approach of a strict H-tree is usually not directly
applicable [26]. Modern electronic design automation (EDA) tools support the implementation of resistance-capacitance (RC) matched trees. This method considers
the parasitic effects of the interconnect wires and provides the opportunity to adjust clock buffer strengths. The crucial feature of this approach is the possibility
to compensate propagation delays. This allows the implementation of irregularly
routed, but nevertheless balanced, clock distribution networks.
A popular approach for implementing a balanced tree for an irregularly structured
design is the clustering method. There, the clocked storage elements are recursively
grouped into clusters. Each cluster consists of one buffer, which distributes the clock
signal to the associated sub-segments. The balance of the tree is accomplished by
applying buffers with the appropriate strength and constant wire lengths within
each cluster.
The absolute signal propagation delay of a clock distribution network is usually
irrelevant, as long as the arrival times at the storage elements are preferably similar.
Given that path propagation delays in real systems are never exactly the same, the
maximum difference between the arrival times is an important reliability parameter
3.2. Digital System Clocking
31
clock
clock
clock
clock
(a) physical view
(b) tree view
Figure 3.4.: H-tree clock distribution network.
of a system. It is usually termed clock skew and describes the time interval in which
the clocked storage elements are triggered. With respect to the system performance
characteristics, the clock skew reduces the effective clock period. Therefore, the
available time for calculations of the combinational logic is shortened. On the other
hand, this effect diffuses the activity of the system and distributes the current
consumption over a certain time interval. As this leads to a lower current peak
value, it possibly eases the power distribution network dimensioning. Additional
advantages of a distributed activity are lower voltage drops in the power supply
system and a reduction of the electromagnetic emission. As a result, an adequate
clock skew, or also termed jitter, is intentionally induced in some cases [27, 28].
3.2.3. Clock Gating
The energy consumption of a system is, especially for mobile devices, of particular
interest. Distributing the clock signal to all storage elements, even if their data
values remain unchanged, causes unnecessary switching activities, and therefore an
inappropriate power dissipation.
A popular technique to avoid unintended circuit activities is clock gating [29,
30]. By inserting gates into the clock tree, it provides the opportunity to prevent
the distribution of the clock signal to the actually inactive functional sub-blocks.
As a result, the idle parts of a circuit can be selectively deactivated. Complex
systems, such as microcontrollers, provide a large number of functionalities and
multiple peripheral units. In programmable systems, it finally depends on the
implemented software, which functional units are actually occupied and active.
Under the assumption that the microcontroller activity can be possibly almost idle,
while its clock subsystem would be active in any case, clock gating is an efficient
32
3. Synchronous Sequential Digital Systems
measure to optimize the power dissipation of a system. As the clock gates itself, and
particularly the additionally needed control logic, occupy chip area and consume
energy as well, it is a compromise solution of how many gates are inserted.
It depends on the application, but some of the functionalities of a microcontroller are only optionally, rarely, or partially used. In this context, it is usually
differentiated between static and dynamic gating:
Static gating is usually applied in systems with different functional units, which
are only optionally used. The enable signals are typically generated by a
system control unit, and cause entire units to enter a sleep mode. Since
deactivating the clock for an entire module prevents the dynamic current
flow, but not the static leakage current, some systems support the additional
functionality to switch off the power supply of particular units.
Dynamic gating is typically done by individually setting the enable signals of the
clock gates at each clock cycle. An internal gating logic determines if a
particular functionality is required and enables or disables the appropriate
clock gate(s). Due to the additionally needed control logic, this concept is
more expensive than static gating, but it allows a selective deactivation of
local resources. At the lowest level, the clock signal distribution to particular
registers, or even single flip-flops, can be restricted.
As statically gated functional units can be considered as completely switched off,
dynamic gating is most challenging at system activity simulations. Figure 3.5 shows
for instance a fragment of a clock tree with inserted gating cells. The clock signal
CLK is distributed through a tree of buffers, and the enable signals are routed to
the inserted clock gates. These signals can be generated by a local control logic,
or, even in case of dynamic gating, globally distributed to several submodules. It is
EN1
EN1
CLK
CLK
EN2A
EN2
EN2B
EN2
EN2A
Figure 3.5.: Clock tree with inserted gating cells.
3.3. Combinational Logic Cells
33
also shown that gating is done hierarchically to provide the opportunity to disable
either entire modules, functional sub-blocks, or even single registers or flip-flops.
In the strict sense, the enable signal of flip-flops (discussed later in Section 3.4) is
also a kind of gating. Often used terms in this context are global and local gating,
where local gating is integrated into clocked storage elements, while global gating
effects a certain number of elements.
Since peripheral units of a microcontroller possibly operate at a fraction of the
system frequency, clock gating is often used to implement a local clock divider. The
circuit in Figure 3.6 shows an implementation of a divider which may be used to
divide the system clock by a factor of two. Setting the signal DIV to low causes
the flip-flop to toggle its output value at any rising edge of CLK. As the latch is
transparent in the low-phases of the system clock, the value of CLKen changes only
at the low-phases of the clock signal and is in any case stable at the high-phases.
This is important, as CLKi should be preferably synchronous to the system clock.
A high level of DIV disables the toggle flip-flop, which leads to a constant value
at the inverter and consequently a permanently ”open” AND gate. Without the
inverter, the AND gate would be permanently ”locked” in this case.
DIV
flip‐flop
D
CP
QN
latch
D
ENN
CLK
Q
CLKen
CLKi
Figure 3.6.: Schematic of a basic local clock divider.
3.3. Combinational Logic Cells
As mentioned in Section 3.1, the combinational part of a circuit is responsible for
the determination of internal results and output values. The output state of a
combinational circuit is at any point in time related to its current input signals by
a boolean expression. There are consequently no instances of storage elements and
typically no combinational feedback loops.
34
3. Synchronous Sequential Digital Systems
The building blocks of combinational logic circuits are the combinational logic
cells, also termed gates. These gates provide basic logic or arithmetic functions,
which can be cascaded to the intentioned circuit function. Figure 3.7 shows a
symbol of a general logic cell, which may have an arbitrary number of inputs and
outputs, depending on the implemented function. This can be a simple inverter or
buffer with one input and one output, but also an arithmetic function with several
inputs and possibly more than one output. All output values of combinational cells
are in any case instantly calculated and provided at the respective output(s).
I1
O1
I2
IM
ON
On = f (I1 ..IM )
Figure 3.7.: Symbol and truth-table of a general combinational logic cell.
3.4. Clocked Storage Devices
Storage elements are generally used to store the state of a system, as discussed
in Section 3.1. This can be intermediate results in flip-flops or registers, but also
a set of data vectors in a memory. Both types of information are intended to be
subsequently processed, and therefore considered as the system state. According to
the definition of synchronous sequential circuits, clocked storage elements represent
the leaf nodes of the clock distribution network.
A common characteristic of all clocked elements is that the point in time, when
the input is captured, is given by the clock signal. The differentiation between the
types of elements is based on the specific sensitivity on the clock signal and the
amount of stored data. While level-sensitive devices (capture the input continuously
during the enabled phase) are commonly termed latches, edge-sensitive devices
(triggered by a clock signal transition) are called flip-flops. A group of a given
number of flip-flops in parallel is usually intended to handle the values of data
buses and is typically called a register. The basic characteristic of a memory is the
ability to store more than one data word (a number of parallel bits), where the
location in the memory is selected by the application of the appropriate address at
the respective input.
As the functionality of clocked storage devices is essentially important for the
operating behavior of synchronous designs, the different device types are briefly
discussed in the following sections. By the reason that the simulations discussed
3.4. Clocked Storage Devices
35
in the following chapters are done to characterize the behavior of already verified circuits, it can be assumed that the timing requirements are already fulfilled.
Therefore, and since the switching behavior is important for the characterization
in Chapter 4, the focus is here on the functionality and special properties of the
respective devices.
3.4.1. Latches
There are different types of latches, but the D-latch shown in Figure 3.8 is most
relevant for synchronous designs. Latches are generally level-sensitive. As shown
in the truth-table, the output Q directly follows the input D as long as it is enabled. An in this context often used term is that an enabled latch is transparent.
Figure 3.8 shows the symbol and the truth-table of a high-active latch, which is
enabled while EN is ’1’. At low-active latches, as it is instantiated for the clockdivider in Figure 3.6, this port is usually labeled EN N or EN . The transparent
behavior of latches may be useful for several applications, such as the mentioned
clock divider, but is problematic in synchronous designs. By the reason that the
input data signals are continuously propagated to the output, latches cannot be
considered as a terminator of combinational paths. As a result, D-latches are only
rarely instantiated for special purposes in synchronous designs.
D
EN
Q
D
–
–
EN
1
0
Q
D
Qt−1
Figure 3.8.: D-latch symbol and truth-table.
3.4.2. Flip-Flops
The predominantly instantiated clocked storage elements are flip-flops. These edgesensitive devices capture the input data signal at discrete points in time, and propagate it consequently to the output. Flip-flops are most commonly sensitive on the
rising edge (posedge-triggered) of the clock signal, but modified types may also be
sensitive on falling edges (negedge-triggered).
Basic versions typically consist of the ports CP (clock), D (data input), and Q
(data output). As shown in Figure 3.9, the output states are set to the input value
at any rising edge of the clock signal. To provide additional features, flip-flops are
often extended by one or more of the following functionalities:
36
3. Synchronous Sequential Digital Systems
D
Q
D
–
CP
CP
↑
Q
D
Figure 3.9.: Basic flip-flop symbol and truth-table.
enable: An often used feature is provided by an additional EN (enable) port. It
is used to prevent the update of the stored data, even if the triggering clock
event occurs. It causes the flip-flop to feed the output value back to the input.
This results in the effect, that a disabled flip-flop consequently captures the
current output value instead of the actual input data.
set/reset: It is in some cases necessary to bring a system into a well-defined state,
such as the power-on state of a given system. These two signals force the flipflop output to either a high (set) or low (reset) state. Given that set/reset
in ASIC designs are most commonly low-active (become effective when set to
’0’), the ports are typically named SN and RN , or R and S. On the other
hand, S or R indicate a high-active set or reset port in this context.
scan test: After the fabrication of chip, several tests are done. One of the functional tests is the so called scan test, where the circuit is reconfigured to form
scan chains. In the course of this test, flip-flops operate in a different mode,
and therefore provide two additional ports; T E (test enable) and T I (test
input). These ports are typically used for the test procedure only, and are
therefore less important during the normal operation of a circuit.
Figure 3.10 shows the symbol and the truth table of a flip-flop with providing the
functionality of an enable and a reset port. The truth table shows the relation of
the output to the different input states. As the shown device is posedge-triggered,
it captures the input value at any rising clock edge, except for the case where it
is disabled. It is also shown that the reset signal is typically asynchronous, i.e.
becomes instantly effective and is consequently independent from the clock signal.
3.4.3. Memories
Many digital designs, but at least programmable devices such as a microcontroller,
possibly consist of internal memories. In addition to a read-only memory (ROM)
with the basic software routines, several static random-access memories (SRAMs)
are possibly instantiated. Given that ROMs and SRAMs are typically sensitive
on the rising clock edge, static memories are a popular kind of posedge-triggered
clocked storage devices.
3.4. Clocked Storage Devices
D
37
Q
D
–
–
–
CP
EN
EN
0
1
–
CP
↑
↑
–
RN
1
1
0
Q
Qt−1
D
0
RN
Figure 3.10.: Extended flip-flop symbol and truth-table.
Figure 3.11 shows a symbol and the respective pin description, and Figure 3.12
the block diagram of the internal structure of an SRAM that is customary instantiated in VLSI designs. An internal memory array is used to store a given number of
data words. The term random access in this context is derived from the characteristic that data words can be read and written in a random order at any successive
clock cycles. An address decoder converts the binary coded address vector with a
length of m bits to the internally used signals for the data word selection. The I/O
buffer latches the input and output data vectors, where each of them consists of n
bits.
A
D
CLK
WEN
CEN
Q
Pin
Description
A[m-1:0]
D[n-1:0]
Q[n-1:0]
CLK
CEN
WEN
Address
Data Input
Data Output
Clock
Chip Enable
Write Enable
Figure 3.11.: Symbol and pin descriptions of a general memory instance.
The internal activity of the shown memory is coordinated by a control unit. It
generates the internal control signals for the address decoder and the data buffer.
As a kind of local clock gating, the chip enable signal is used to control the internal
clock signal propagation. Given that CEN is typically low-active, this type of
SRAM performs a read or write operation at each clock cycle, as long as the enable
signal is low. On the other hand, a high CEN signal causes the memory to ignore
all incoming requests. The write enable signal is used to distinguish between the
read and the write mode of the device. In the write mode, the memory is forced
to store the applied input data vector D at the given address A in the memory
array, and to additionally propagate the data to the output Q. In the read mode,
the addressed data word is read from the memory array and provided at the data
3. Synchronous Sequential Digital Systems
A[m‐1:0]
CLK
CEN
WEN
D[n‐1:0]
address decoder
38
control
unit
memory
array
data I/O buffer
Q[n‐1:0]
Figure 3.12.: Basic block diagram of a static memory.
output after a certain delay. Alternative SRAM implementations feature extended
write enable signals in form of a write mask. This provides the opportunity to
selectively write a segment, or possibly even single bits of the data input vector
into the memory.
The timing diagram in Figure 3.13 shows the common dependencies of the SRAM
activity during read and write operations. Three clock cycles are shown, where the
following operations are triggered by the rising edges of the clock signal:
1. A low chip enable and high write enable signal at the rising edge of the clock
forces the memory to read the stored data at the address A1. The values
are propagated to the output and available after a certain delay. The applied
input data vector D1 is ignored.
2. Given that a high chip enable signal disables the device, none of the applied
signals is recognized at the second rising clock edge. The output data remain
unchanged until the next read or write operation is executed.
3. As both CEN and W EN are low at the third clock cycle, a write operation is
initiated at the clock edge. Therefore, the data input values D2 are captured
and stored in the memory array at the given address A2. After a certain
delay, the data D2 are additionally propagated to the output.
As mentioned in Chapter 2, SRAMs are typically instantiated as a macro cell
which is usually compiled by a so-called memory compiler. The basic structure of
3.4. Clocked Storage Devices
39
CLK
CEN
WEN
A
A1
A2
D
D1
D2
Q
Q1=D(A1)
Q2=D2
Figure 3.13.: SRAM timing diagram for one read and one write operation.
preferably all memories in a design are therefore quite similar, while the data vector
width (n) and the address range (m) are adapted for the respective application.
In addition to SRAMs, digital systems consist in some cases also of ROM instances. Even though the data in the memory array of such ROM devices is unchangeable, the basic structure and operating method is similar to SRAMs. By the
reason that ROMs are read-only devices, there are no data input and write enable
signals provided. But the timing characteristics shown for the read operation of an
SRAM in Figure 3.13 are finally also valid for ROMs.
4. Standard Cell Characterization
Modern digital integrated circuits are often quite complex and probably consist of
several millions of transistors. If possible at all, simulations to characterize the
behavior of large designs at transistor level demand excessive computational power
and memory. This chapter introduces a novel method to significantly reduce the
simulation effort by pre-characterizing the cells of a standard cell library. Based on
the analysis of the static and dynamic behavior of CMOS devices, the extraction of
important parameters which enable fast current profile calculations, is discussed.
The finally resulting library of pre-characterized cells is the basis of an efficient
gate-level circuit analysis, such as the current profile determination discussed in
Chapter 5.
4.1. Outline of the Methodology
The approach of the characterization procedure is based on single cells simulations
under different operating conditions, as well as on the extraction of a set of characteristic parameters, and on storing the results in a specific library (Figure 4.1).
The starting basis are the transistor level cell descriptions. Analog circuit simulations using BSIM (Berkeley Short-channel IGFET Mode) transistor models are
performed to determine the behavior of each single cell under different operating
conditions. The simulation results are analyzed, and a set of parameters, describing the timing characteristics as well as the current consumption, are extracted.
Storing these results in a specific library provides a description of the dynamic behavior for subsequent simulations. In the aim of enabling gate-level simulations,
additional parameters like port-definitions and the logic cell-functions are provided
in form of static cell properties.
The characterization has to be done only once per technology library. As the
standard cells can be considered as behavioral black boxes during the simulation
of entire modules, this strategy provides an enormous speed-up compared to traditional transistor level simulations. A further advantage of this method is that most
of the processes can be easily automated. This is advantageous, since standard cell
libraries typically consist of several hundred different cells. This procedure has been
applied for a 130 nm technology, but has also been approved for a 90 nm library.
dynamic cell
behavior
42
4. Standard Cell Characterization
transistor level
cell descriptions
analog single cell
circuit simulations
characteristic parameter
extraction
timing
characteristics
static cell
properties
current profile
characteristics
reference
current profiles
dynamic cell
behavior
Figure 4.1.: Flow of the standard cell characterization procedure.
4.2. Dynamic Cell Behavior Considerations
It is very important to analyze the behavior of standard cells in terms of minimizing
the characterization effort as well as the required amount of data to be able to
reconstruct the timing characteristics and current profiles efficiently.
At a first glance, the current consumption depends particularly on the cell activity. There is a significant difference if an input event affects only internal nodes, or
if an additional output state transition occurs. Figure 4.2 shows the transistor level
VDD
VDD
IDD
VA
IA
Z
NAND2
VB
A
IZ
VZ
IB
ISS
B
VSS
VSS
Figure 4.2.: Symbol and transistor level schematic of a NAND gate with 2 inputs.
4.2. Dynamic Cell Behavior Considerations
43
schematic view and a symbol with the electrical characteristics at the terminals of
a NAND gate. As the cells are characterized as single devices, but interact with the
surrounding cells in the final circuit, the behavior of each terminal is important. In
the face of analyzing the current consumption of an entire system, IDD and ISS are
supposed to be equal. But in the course of characterizing cells as single devices,
there are significant differences.
The plots in Figure 4.3 show the current profiles of the NAND gate shown above
for all activities that may be triggered by an event at port A. The profiles for
activities at port B are similar. It can be seen that the current consumption highly
depends on the actual port activity. Input events that effect an internal node, but
where the output remains stable, show only minor current consumption, as plotted
in Figure 4.3(a) and (c). On the other hand, the current peaks caused by events
−4
−4
x 10
x 10
6
IDD
ISS
4
current [A]
current [A]
6
2
0
−2
2
0
0
100
200
time [ps]
300
−2
400
0
100
200
time [ps]
300
400
(b) port A: rising edge, port B: high
−4
−4
x 10
x 10
6
6
IDD
ISS
4
current [A]
current [A]
ISS
4
(a) port A: rising edge, port B: low
2
0
−2
IDD
IDD
ISS
4
2
0
0
100
200
time [ps]
300
(c) port A: falling edge, port B: low
400
−2
0
100
200
time [ps]
300
400
(d) port A: falling edge, port B: high
Figure 4.3.: Current profiles for different cell activities.
with an output state change are significant and shown in (b) and (d). It is also
shown that there are possibly significant differences between IDD and ISS . While
the current IZ at a low-high transition of the output (Plot (d)) is primarily part of
IDD , a high-low transition (Plot (b)) directly effects ISS . Even though the current
profiles are significantly different at one single cell, IDD and ISS of an entire system
44
4. Standard Cell Characterization
have to be equal. This is given by the reason that IZ is also part of IDD and ISS
of the cells that are connected to the output, and the input currents IA and IB are
also existent in the same way at the driving cell.
As the internal structure of a cell actually consists of a network of parasitic
capacitances, further considerations are necessary. In case the NAND gate shown
in Figure 4.2 is triggered by a rising edge at port B, while port A is still high, the
output potential is pulled towards VSS . As a consequence, the drain potential of
both transistors connected to port A changes by the full output voltage swing, and
the according gate-capacitances have to be charged/discharged by the same value
as well. The associated current flow IA is in this case not caused by an active event
at port A and therefore not yet considered as part of the switching current of any
actively driving cell.
A challenge in characterizing the dynamic behavior of a cell is that the operating
performance highly depends on the circuit environment. The dynamic behavior
of a CMOS circuit is an interrelation of interconnected devices. Signal slew rates
are the result of interactions of driver strengths and driven loads (fan-in of the
cells connected to the output). With the intention of characterizing the switching
behavior of standard cells, the important parameters are thus the input signal
slope and the driven output load. Figure 4.4 shows the data signal voltage and the
according current profiles of a buffer (two subsequent inverters), which is triggered
by a rising edge, but driving different loads. The loads are represented by inverters
with different transistor sizes. It can be shown that higher loads lead to an increased
current flow and decreased signal slew rates.
1.6
1
1.4
0.8
0.6
1
IDD [mA]
data signals [V]
1.2
0.8
0.6
0.4
0.2
0.4
VIN
0.2
0
VOUT
0
0
50
100
150
time [ps]
(a) data signals
200
250
−0.2
0
50
100
150
time [ps]
200
250
(b) current profiles
Figure 4.4.: Data signals and current profiles of a buffer for different loads. Higher
loads cause a higher current flow and reduced output signal slew rates.
4.3. Equivalent Inverter
45
As different loads lead to different signal slopes, and the output signals of one
cell are equivalent to the input signals of the next one, different input signal slew
rates have to be considered as well. This effect is most significant at single-staged
cells, where transistor terminals are directly connected to both, input and output.
The plots in Figure 4.5 show these effects at an inverter that is triggered by falling
edges with different signal slopes. The output load in this example is supposed to
be constant. It can be observed that reduced signal slew rates lead to also reduced
current profile slopes and furthermore to considerably lower peak values. Even
though the effect is minor, the output signals are also affected due to the coupling
effects caused by the internal parasitic capacitances.
1.6
1
1.4
0.8
0.6
1
0.8
VIN
0.6
VOUT
IDD [mA]
data signals [V]
1.2
0.4
0.2
0.4
0.2
0
0
0
50
100
150
time [ps]
200
(a) data signals
250
−0.2
0
50
100
150
time [ps]
200
250
(b) current profiles
Figure 4.5.: Data signals and current profiles for different input signal slopes. Reduced input signal slew rates lead to reduced peak current values.
4.3. Equivalent Inverter
Tools for placing and routing a design have the ability to connect almost any cells
of a library to each other. This results in a virtually unmanageable number of
combinations. As a consequence, it is not reasonable to simulate each cell with all
possible configurations. A reduction to a moderate set of environmental conditions
is necessary. This requires a substitution of the possible neighbor cells (driver, parallel cells, output load) by an equivalent circuit that is able to cover all important
conditions. One solution in this context is the introduction of parameterized equivalent inverters. There are several publications dealing with approaches to model
the timing behavior of CMOS gates [31, 32, 33]. It is demonstrated that the behav-
46
4. Standard Cell Characterization
ior can be modeled by an appropriately configured equivalent inverter. The gate
structures, including the parasitic effects of the transistors, are analyzed and substituted by inverters with similar characteristics. These approaches promise that
the transient behavior of a cell can be modeled with an accuracy of a few percent
deviation, but do not account the current consumption as a time-domain profile.
As discussed in Chapter 2, the gate capacitance of a transistor is actually nonlinear and depends on the operating region, but is almost proportional to the channel
dimensions. Assuming that the channel lengths of all transistors of one technology
library are constant, the capacitances are directly proportional to the channel width.
In terms of modeling the capacitive load, it is consequently legal to replace two or
more parallel transistors by one transistor with the appropriate channel width. It
is therefore possible to combine several parallel inverters to one equivalent inverter.
Granted that all transistors have a similar gate oxide, also the ratios of the widths
of the PMOS (WP ) and the NMOS (WN ) transistors are insignificant. As a result,
the dimensions of the channel width of both equivalent transistors (WP e , WN e ) can
be simplified by setting them to their average value:
WP e = WN e =
X WP + WN
2
(4.1)
While the gate widths of parallel inverter structures can be safely added and averaged without interfering the transient behavior, the substitution of combinational
gates like NAND or NOR (Figure 4.6) requires a more detailed analysis. During
a low-high or high-low transition of an inverter, the input and output voltages are
VDD
VDD
A
Z
B
A
Z
B
VSS
(a) NAND
VSS
(b) NOR
Figure 4.6.: Transistor level schematic view of NAND and NOR gates.
4.3. Equivalent Inverter
47
pulled in the opposite direction. This means that the transistor gate and drain
potentials change their values, while the source terminals are always tied to VDD
or VSS . Hence, the gate-source capacitances are charged/discharged by the actual
input voltage swing (∆VGS = ∆Vin ). As the output moves in the opposite direction
of the input, the voltage change over the gate-drain capacitances is twice the input/output voltage swing (∆VGD = ∆Vin + ∆Vout ). In the cases where transistors
of the same type are in series, like PMOS at NOR and NMOS at NAND gates,
the source terminals of the outer transistors are tied to VDD or VSS , but the node
between them (Vx ) may be at an intermediate potential. On the other hand, if a
transistor is switched on, the drain-source voltage (VDS ) of all parallel transistors
is virtually zero. As a consequence, even if its gate voltage is changed, the drain
and source potentials remain almost unchanged. This results in a significantly different current consumption and also affects the output signal slope compared to
the behavior of an inverter.
Assuming that the load of a given cell is a NAND gate, the plots in Figure 4.7
demonstrate that the output current depends on its activity. The solid lines represent the cases where an output state transition of the NAND gate occurs. The
behavior in this case is similar to the previously mentioned inverter, and the variation between the curves for the triggered ports A and B are consequently minor.
This can be accepted, as the total current is in both cases similar and the effects
on the output signal slew rate are within a tolerable range of a few picoseconds.
0.16
0.02
A↑, B high
A↑, B low
B↑, A high
B↑, A low
0.14
0.12
−0.02
−0.04
IZ [mA]
IZ [mA]
0.1
0
0.08
0.06
−0.06
−0.08
0.04
−0.1
0.02
−0.12
0
−0.14
−0.02
0
50
100
time [ps]
(a) rising edge
150
200
−0.16
A↓, B high
A↓, B low
B↓, A high
B↓, A low
0
50
100
time [ps]
150
200
(b) falling edge
Figure 4.7.: Output current profiles for a NAND-type load.
More detailed considerations are advisable if one of the inputs remains low
(dashed lines). The output state, and therefore all drain and source terminals
of the parallel PMOS transistors, remain virtually at VDD . The drain-source volt-
48
4. Standard Cell Characterization
age swing consequently is ∆VDSp = 0. This leads to a behavior of the transistors
that is similar to a MOS capacitor. The voltage swing of the node between the
NMOS transistors is limited to ∆VDSn = VDD /2. Extrapolating these issues to a
NAND structure with N ports leads to the result, that ∆VDSp = 0 is valid in all
cases where one port remains low. The voltage swing of inner nodes is therefore
generally ∆VDSn = VDD /NL , where NL is the number of ports in the low state. A
NOR gate is complementary in this respect. The relations for ∆VDS for an N -port
NOR are simply contrariwise: ∆VDSp = VDD /NH , where NH is the number of high
ports, and ∆VDSn = 0 if at least one port is high. If all inactive ports of a NOR
gate are low, the behavior is analog to a NAND gate where all inactive ports are
high; like an inverter with the corresponding transistor sizes.
Taking all the previously mentioned considerations into account leads to the
result, that an equivalent inverter in combination with two MOS capacitors provides
all necessary contributions to sufficiently cover the behavior of any combinational
CMOS gate. In this context the capacitors are implemented similar to an inverter
where the drain terminals are not connected together, but to the appropriate source
potential (Figure 4.8). These two equivalent devices provide the required possibility
to model the different circuit environments for the characterization procedure.
VDD
VDD
Vin
Vout
VSS
(a) inverter
A
VSS
(b) capacitors
Figure 4.8.: Schematic view of equivalent devices.
4.4. Single Cell Simulations
As introduced at the beginning of this chapter, single cell simulations are performed
to determine the dynamic behavior of all cells in a library. Each single cell is
simulated under different environmental conditions using a SPICE based analog
circuit simulator. This includes the relevant activities of each cell and the effects
of applicable circuit environments. The characterization of a complete library can
result in an extensive effort, as it possibly consists of several hundred different cells.
4.4. Single Cell Simulations
49
It is therefore advantageous to accomplish a method which allows a preferably high
grade of automation and a possibility to consider cells independent of their function
as black boxes.
4.4.1. Simulation Environment
Due to parasitic coupling effects the behavior of a cell theoretically depends on the
configuration of the entire circuit. Signal slopes are a product of interrelations of
driver strengths and output loads. As mentioned in Section 4.3, load cells can be
substituted by appropriately configured equivalent devices. The focus on modeling
the input signal slope is on covering a reasonable range of slew rates. Figure 4.9
for instance shows the finally established characterization simulation circuit for a
combinational gate with two inputs and one output. The cell under characterization
is embedded in a circuit of parameterizable equivalent devices, which allows the
emulation of different environmental conditions.
equIV
equIV
parA
VstmA
driverA
equIV
equIV
equIV
load1
load2
VDD
IDD
IINA
IINB
NAND2
IOUT
equCAP
ISS
equIV
VstmB
driverB
VSS
load
parB
Figure 4.9.: Characterization simulation circuit for a two-input NAND gate.
The circuit is stimulated by two voltage sources (VstmA , VstmB ), according to the
number of inputs. These sources are used to trigger all relevant activities during
the simulation. Because it is inevitable that pulsed sources change their values
abruptly, several subsequent buffer-cells are used to form realistic signal slopes.
Otherwise, due to the capacitive loads, sharp-edged voltage changes would result
in unrealistic current peaks. The equivalent inverters driver and par are used to set
50
4. Standard Cell Characterization
up the intended input signal characteristics. While larger driving transistor sizes
increase the signal slew rates, an appropriate parallel load provides in turn the
ability to reduce the slew rates. Verifications of the approach to model the output
load with equivalent devices have shown, that the most efficient way is to perform
two parallel simulations: one of them with equivalent capacitances and the other one
with two successive inverters (load1, load2). This is used to consider the capacitive
coupling effects of the gate to the drain (CGD ) and source (CGS ) terminals of load1,
since the gate capacitance of load2 consequently effects the output behavior of the
characterized cell as well.
4.4.2. Parameter Variation
As it is unfeasible to simulate all possible circuit configurations, a representative
set of parameters needs to be appointed. The simulations must cover a wide range
of conditions and enable an accurate interpolation of the results. As mentioned
before, the total current consumption, as well as the timing characteristics, are
almost proportional to the transistor sizes over the period of a transition. This
leads to the initial approach that it is sufficient to consider each equivalent device
(shown in Figure 4.9) with two different configurations. All other conditions can in
a good approximation be determined by a linear interpolation of these simulation
results. The number of simulations is then already manageable and the accuracy is
acceptable in most cases. Considering that the current peak values are not directly
proportional to the load size, a third run per equivalent device is necessary. This
enables an appropriate peak-value interpolation and leads to more accurate results.
On the other hand, the number of simulations
Nsim = (Ndriver · Nparallel )Nports · Nload1 · Nload2
(4.2)
is still very high, as it increases by the factor of Ndriver · Nparallel = 9 for each additional port. Therefore, it is reasonable to analyze the effect of the single parameters
more detailed and consider them appropriately:
Output loads show the most diversified effects on the cell behavior. The directly
connected equivalent device (inverter load1 or capacitor load) has to be modeled as mentioned in the previous sections and alternated three times. The
second inverter (load2) has to be considered, but a linear interpolation of two
scenarios is, due to the marginal effect, feasible.
Driver strengths are used to adjust signal slew rates. Large transistors driving the
inputs of a characterized cell lead to simulations with fast rising and falling
edges. One approach to reduce the number of varied parameters is, to set
the drivers of all inputs to the same value. Given that the simulations of all
4.4. Single Cell Simulations
51
active events are already done with all relevant circuit configurations, this decision is applicable. A current flow as a consequence of charging/discharging
internal nodes, while the appropriate port state remains stable, is virtually
independent from the driver strength. These considerations are similar to the
leakage current discussed in Chapter 2.
Parallel loads are used in combination with the fan-in of the characterized cell to
reduce the input signal slew rate. High parallel loads lead to slow rising/falling
edges, even if the driving cell strength is very high.
Due to the fact that the effects of the driver strength and the parallel load
are complementary, the variation of one of them is sufficient under normal circumstances. It can be shown that the application of a relatively strong driver,
compared to the fan-in of the characterized cell, and the alternation of the parallel
load, covers the conditions in real circuits in a good fashion. The applied parallel
cell is therefore alternated to analyze the scenarios where it is not present, similar
to the default load of the given driver, and the double size of the default load.
Taking all these considerations into account, the number of characterization simulations per cell can be reduced to:
Nsim = Ndriver · Nparallel · Nload1 · Nload2 .
(4.3)
In case only one driver strength is applied, an effort of Nsim = 18 can be finally
achieved for any kind of cell in the library.
4.4.3. Simulation Stimuli
For a comprehensive characterization of a cell it is necessary to trigger all relevant
activities in course of the simulations. While the behavior of combinational logic
gates is sufficiently determined by executing all possible input signal transitions,
the behavior of clocked storage elements, such as flip-flops or latches, additionally
depends on the output state. This becomes evident by comparing the activity of a
clock-triggered flip-flop, where the input data value leads to an output transition,
and the case where the appropriate value is already stored. The activity, and therefore also the current consumption is significantly different for these two situations.
Due to optimization reasons, the different cell types are considered specifically.
Combinational Gates
The number of possible activities of combinational gates Ncomb solely depends on the
number of inputs. A gate with m inputs features 2m different states. Considering
52
4. Standard Cell Characterization
that each state can be the result of a transition at any of the inputs, the number
of possible activities is consequently given by
Ncomb = m · 2m
(4.4)
This is manageable for gates with low pin counts. Most cells in a library have two
to four ports, but especially complex gates have up to six or more ports. Without
further considerations, the simulation effort and the resulting amount of data would
be exceedingly high. On the other hand, symmetric structures allow a reduction to
an acceptable number of analyzed transitions due to several similar port behaviors.
Latches
This cell-type combines the features of combinational gates and storage cells. It
distributes the input signal directly to the output in the transparent mode, or
stores the output state otherwise. This means that the output activity additionally
depends on the previous state. The number of possible different transitions Nlatch
for m ports is given by
Nlatch = m · 2m+1
(4.5)
Latches typically have two inputs (data in and enable), but there may exist
versions with an additional reset port. Granted that the circuit simulations are
limited to the normal circuit operation, omitting the system reset condition, this
port is permanently set to its inactive level. As a result, the number of simulated
transitions is constant (Nlatch = 16) for all kinds of latches.
Flip-Flops
The output state of a flip-flop only changes at a specified clock edge. This means
contrariwise that a data value transition effects in any case only internal nodes.
But, due to the internal feedback of the output state, the dynamic behavior during
a data value transition is considerably different. This leads to the finding that the
number of transitions for flip-flops Nf f with m inputs is equal to latches:
Nf f = m · 2m+1
(4.6)
Standard flip-flops typically have the ports data in, clock, and enable. Special
versions additionally support a scan-test mode and have two additional ports (scan
enable and scan input). Other commonly provided ports are the inputs for the
set and reset features. As a result, a full-featured flip-flop may have up to seven
inputs. This would result in Nf f = 7 · 28 . By a concentration on the normal
circuit operations, all the special ports can be set to an inactive state. As a result,
the number of triggered events can be significantly reduced to Nf f = 16 for basic
versions without enable ports.
4.5. Characteristic Parameter Extraction
53
4.5. Characteristic Parameter Extraction
The results of the single cell simulations provide the information concerning the
behavior in terms of timing characteristics and current consumption. For a determination of the behavior of a cell in different circuit configurations, an interpolation
of the simulated conditions is required. The characteristic parameters are therefore
extracted out of these data, which can be consequently interpolated over a wide
range of environmental conditions. In addition, this procedure significantly reduces
the amount of finally stored data in the library of characterized cells.
4.5.1. Timing Characteristics
One basic prerequisite for the determination of a circuit behavior is the identification of the signal timing. Essential parameters in this context are the input
and output transition characteristics, as well as the propagation delay. PMOS and
NMOS transistors of standard cells are typically dimensioned to have almost similar
timing properties for rising and falling edges. But especially cells with the intention to distribute the system clock signal, are often optimized for a preferably fast
distribution of the rising clock edge. Hence, it is important to distinguish between
the timing of rising (tr , tLH ) and falling (tf , tHL ) edges, as well as the propagation
delays tpLH and tpHL .
Due to the various dependencies of the cell behavior, as discussed in Section 4.2,
all timing parameters have to be determined for all simulated conditions. The plots
in Figure 4.10 for instance show a comparison of several possible input and output
signal characteristics of an inverter. All signals are therefore aligned at the 50%
point of the input voltage Vin . As the propagation delay is commonly defined as the
time between the 50% points of the input and output voltage swing, it is directly
observable along the line at ∆V /2.
Storing these parameters in the library of characterized cells, provides the possibility to determine the actual timing-behavior of a cell for a given circuit configuration by a simple interpolation of these values.
In addition to the timing characteristics, the plots in Figure 4.10 also give a
good overview of the coupling effects on a signal transition. It is shown that each
signal shows a slight overshoot prior to any transition. Given that capacitances
cannot be charged/discharged infinitively fast, the output initially follows the input
voltage. It can be seen, that the overshoot amplitude increases with the signal slew
rate. On the other hand, the last part of a transition depends on the output load
characteristics. Due to the subsequently triggered activity of a cell connected to the
output, the final transition progress is affected. These effects are less important for
the timing, but considerable at the current profiles discussed in the next section.
4. Standard Cell Characterization
1.5
1.5
1.25
1.25
data signals [V]
data signals [V]
54
1
0.75
Vin
0.5
V
1
0.75
Vin
0.5
V
out
out
ΔV/2
0.25
0
−100
ΔV/2
0.25
0
−50
0
50
time [ps]
(a) rising edge
100
150
−100
−50
0
50
time [ps]
100
150
(b) falling edge
Figure 4.10.: Signal characteristics of an inverter for different conditions in terms
of different input slopes and output loads.
4.5.2. Current Profiles
The current consumption of CMOS devices consists of a static and a dynamic component. The single cell simulations are done to analyze the dynamic behavior, but
implicitly also include the static leakage current. Taking a closer look at the current
profile for any transition shows different values of the current flow in the steady
states before and after an event. Storing these values in the library, subsequently
enables the determination of the static current consumption of a given design.
The extraction of the dynamic part in terms of the current profiles is less trivial
and has to be analyzed in more detail. As discussed in the previous sections,
the current consumption significantly depends on the activity, but shows similar
characteristics for the respective events. The plots in Figure 4.11 exemplarily show
some current profiles of a flip-flop, which is triggered by rising edges with different
slew rates at the clock input, and connected to different output loads. Indicated
by the high peak values in conjunction with the varying curves at the end of the
profiles, the output performs also a rising edge. The structure of flip-flops is much
more complex than that of combinational gates, and consist therefore of several
internal stages. This leads to current profiles with more diversified shapes than the
single peaks of basic gates, which were discussed in the first part of this chapter.
On the other hand, this allows a better distinction between the effects caused by
different input slopes and output loads.
Analog to the signal overshoots, mentioned in the section discussing the timing
characteristics, also the current profiles show this effect in the very first part. This
4.5. Characteristic Parameter Extraction
55
0.3
0.25
IDD [mA]
0.2
0.15
0.1
0.05
0
−0.05
0
50
100
150
200
time [ps]
250
300
350
400
Figure 4.11.: Current profiles with marked characteristic values for a rising edge
at the clock input of a flip-flop for different input slopes and output
loads.
simplifies the detection of the starting point, as a profile consequently features a
zero point prior to the actual transition. The slope of the subsequent part in case
of such a complex cell is directly related to the input signal characteristics. Similar
to the overshoot amplitude, the following peak value depends on the slew rate. A
particular property of complex cells is the central part, which is almost independent
from the environmental conditions. This demonstrates, that the capacitive coupling
effects are limited to single stages. The final part of the profile predominantly
depends on the size and structure of the output load. The diversification of the
falling part at the end is caused by different sizes of the second output load stage.
The most significant characteristic values of current profiles are primarily the
peak values and zero points. Due to the diversified last part of the profiles, the
position, where the half of the last peak value is reached, is additionally appointed.
Given that the final profile part is similar to an exponential function, the profile
end is detected by an intersection point with an appropriate threshold value.
Depending on the profile characteristics, all zero points and significant peak values are extracted and stored in the library. An interpolation of these values results
in the definition of the actual profile characteristics for a given environmental condition. As a consequence of the similarity of the profile shapes, it is possible to
align a reference profile to these points. Given that this method is not entirely
accurate, the average current flow between the zero points is calculated and additionally stored in the library. Scaling the aligned profile to the correct amount of
current efficiently compensates the last-mentioned imprecision.
56
4. Standard Cell Characterization
4.6. Macro Cell Characterization
As discussed in the previous chapters, complex designs probably consist of macro
cells. These components are usually memory instances, but possibly also function
blocks in terms of hard macros, or even other types of reused sub-circuits.
Macro cells may have numerous ports. A characterization approach similar to
the procedure for standard cells is consequently unreasonable, as the number of
simulations and also the amount of resulting data is inapplicable. In case of general
sub-circuits, such as customary function blocks, a characterization can probably be
done by applying a modeling method, such as the approaches introduced in the next
Chapter 5. Including the timing characteristics and current profiles for reasonable
conditions into the library, enables a consideration of such macro cells in a similar
way as standard cells.
On the other hand, memory instances show some important properties in terms
of a very regular structure. This allows a characterization by simulating a certain
number of significant transitions similar to the introduced procedure for standard
cells. Figure 4.12 shows the current profile for a write operation of a memory in the
upper plot, where a varying percentage of data outputs are caused to change their
state. It can be observed that the contribution of the internal activity is significant,
and the profile is marginally effected by different output activities. The second plot
shows that the profile differences are, due to the regular structure, additionally
proportional to the number of output transitions. As a result, it is consequently
feasible to take the profile for the invariant part and to add an appropriately scaled
output current waveform to determine the current consumption profile for any given
number of output transitions. Further analysis have shown that this is also valid
for read operations.
Since all data inputs and outputs of a memory are typically buffered, only clock
events may cause output transitions. Therefore, changes of the data input vector,
as well as the behavior in consequence of an applied address, can generally be considered as independent from the output load. As a result, the characterization of
these ports is sufficiently covered by simulations with different input signal characteristics, analog to the discussed procedure for flip-flop inputs. Moreover, the
current profiles for data input events are again scalable, as the input buffers are
typically a register of identical flip-flop instances.
Additionally important are the features provided by the memory control unit.
It typically provides a chip enable port, which can be used to disable most of the
internal blocks and avoid unnecessary activities. As this is a kind of clock gating,
all input and output buffers, but the address decoder as well, are possibly forced to
ignore all incoming events. The current consumption, caused by the stimulation of
the circuit parts that are directly connected to the inputs, is in this case significantly
reduced to a minimum. Even if the current consumption of a disabled memory is
minor, and therefore negligible in most cases, this behavior can be determined by
4.6. Macro Cell Characterization
57
a few simulation runs with appropriate input signal characteristics.
As a result, also macro cells can be added to the library of characterized cells,
provided that there is an applicable method to determine the respective timing
characteristics and current profiles. In case of regularly structured cells, such as
memory instances, it has been shown that a procedure similar to the principle for
standard cells is applicable.
current [mA]
150
100%
75%
50%
25%
0%
100
50
0
0
200
400
600
800
time [ps]
1000
1200
1400
(a) total current
100%
75%
50%
25%
current [mA]
30
20
10
0
0
200
400
600
800
time [ps]
1000
1200
1400
(b) additional current compared to 0% activity
Figure 4.12.: Current profile of a memory performing a write operation for a varying
percentage of changing output data values.
5. Netlist Based Current Modeling
The current consumption estimation of digital modules features the most benefit in
very early design phases. Since placement and routing of the devices on a chip are
one of the last tasks in the design flow, first analysis must be based on gate-level
netlists. This form of circuit description is basically a list of the instantiated cells,
including a specification of the interconnected ports.
Given that especially complex systems are typically analyzed in various configurations and operating modes, the simulation effort to determine the most critical
features, is an essential aspect. This chapter discusses approaches to efficiently
model the current consumption profiles for digital systems. Since a traditional
evaluation of stimuli vectors is not feasible for complex systems in terms of an excessive computational effort, an alternative method is introduced. Due to an isolated
analysis of circuit parts with significantly different characteristics, an appropriate
combination of a pattern-based approach and a random activity estimation is applied. The current profile calculation is based on the library of pre-characterized
cells introduced in the previous Chapter 4.
Gate-level netlists neither provide any information concerning the power distribution network, nor the placement and routing of the particular on-chip devices.
The resulting current profiles consequently represent the ideal case in terms of a
constant power supply voltage, as well as lossless cell interconnects. Approaches to
estimate the parasitic effects of real power distribution networks and interconnect
wires are discussed in Chapter 6.
5.1. Current Profile Calculation
Traditional SPICE-based circuit simulation tools are based on low-level nodal approaches. These methods feature highly accurate results, but have essential disadvantages in the course of analyzing large circuits. If possible at all, the simulation
of large modules demands excessive computation power and memory.
The method, which is discussed in the following sections, is based on the library
of pre-characterized cells, and provides the ability to calculate the current profiles
sequentially for single switching events. Therefore, the environmental conditions
of the particular cells are identified, and applied during the current profile determination. The profiles for the activity of entire circuits are finally composed by
superposing the single event results with the appropriate time offset.
60
5. Netlist Based Current Modeling
5.1.1. Cell Environment Identification
As discussed in Chapter 4, the transient behavior of a cell primarily depends on its
input signal characteristics and the output load. Since these parameters are given
by the configuration and interrelation of the instantiated cells, the characteristics
of the ambient circuit properties have to be identified. Due to the library-based
determination of the dynamic cell behavior, the important parameters are extracted
from the circuit configuration. For the determination of the circuit environment,
the static properties in terms of the driving transistor strengths and the equivalent
inverter characteristics of the appropriate cell inputs are provided by the library of
pre-characterized cells.
Figure 5.1 exemplary shows the relevant circuit part for a given cell. Related
to the data sets which are provided by the library, the important parameters are
the strength of the driving cell, the fan-in of possible parallel cells, and the characteristics of the cells connected to the output. Under the assumption that port A
of the shown cell is triggered, the driver strength in form of an equivalent inverter
eIN Vdrv equals the output properties of drvA :
eIN Vdrv = eIN Vout (drvA )
(5.1)
Given that the input characteristics of the parallel cells parA1 and parA2 are available in form of equivalent inverter specifications for the appropriate port (eIN Vin ),
they can be combined to an equivalent parallel inverter eIN Vpar by:
eIN Vpar =
X
eIN Vin (k)
(5.2)
k
With eIN Vdrv and eIN Vpar , the input signal slew rate can be estimated. As the
actual slope additionally depends on the specification of the preceding cells, a kind
of parameter tuning is necessary. Depending on the parameter sets provided by
the library, eIN Vdrv and eIN Vpar are adapted to fit the input signal characteristics
to the actually required rise/fall time. The approximated value is in this case the
output signal timing of the driving cell, which is calculated in the course of the
dynamic switching behavior determination, discussed in the next section.
Finally, the output load is determined in a similar way as the parallel load by
combining the connected cell input characteristics eIN Vin to an equivalent load
inverter eIN Vload by
eIN Vload =
X
eIN Vin (k).
(5.3)
k
As the second output load stage is also relevant, this is determined analogously to
the primary load by combining the input characteristics of the respective cells to
an additional inverter.
5.1. Current Profile Calculation
61
parA1
A
Z
B
drvA
parA2
A
Z
A
A
Z
B
B
Z
C
coi
A
A
drvB
A
Z
Z
A
Z
B
Z
A
B
parB1
Z
A
A
Z
Z
B
B
Figure 5.1.: Relevant circuit part for the environment parameter extraction for a
given cell of interest (coi).
5.1.2. Single Event Characteristics Determination
The library of pre-characterized cells provides a set of parameters, which describe
the timing characteristics and current profiles for a given number of circuit configurations. Since the actual configurations differ from the characterized conditions,
the stored values have to be interpolated. As the dynamic behavior of a cell depends on multiple parameters, an appropriate multi-dimensional interpolation is
necessary. Applicable algorithms are discussed in [34].
The performed operations in the course of determining the current profiles for
a given cell activity are demonstrated in Figure 5.2. The plots show for instance
the behavior of an XOR gate with three inputs for an event, which causes a lowhigh transition at the output. The dots in the figure represent the characteristic
parameters, which are provided by the library for different configurations. As the
requested condition deviates from the characterized cases in this example, an appropriate interpolation of the available values is necessary. Due to the similarity of
the profile shapes for a given event, a reference profile (dashed line) is aligned to
the calculated vertices of the resulting profile (solid line). A tree-port XOR gate is
relatively complex and therefore consists of internal nodes between the inputs and
the output. As a result, this example enables to distinguish between the effects of
the input signal characteristics, and the output load. It is shown that the first peak
of the resulting profile is lower, and the second one is higher than the reference profile. This indicates that the requested condition has shown a relatively slow input
signal transition in conjunction with a high output load. Therefore, the amplitude
and the time instance of the first current peak has been determined by an interpo-
62
5. Netlist Based Current Modeling
characteristic points
reference profile
result
current [µA]
200
150
100
50
0
0
50
100
150
time [ps]
200
250
300
Figure 5.2.: Current profile calculation for a single event. The characteristic parameters in the library are interpolated, and a reference profile is aligned
to the appropriate coordinates (time and amplitude).
lation of the reference condition and the values for very slow input signals. On the
other hand, the second peak has been determined by processing the reference condition and the configuration for high loads. The last characteristic value marks the
time instance where the amplitude is the half of the last peak value. In addition
to the mentioned coordinates, the zero points of the profile are determined. All
profiles show at least one zero point at the beginning of a transition, but there are
cells and events with a more significant negative amplitude than the shown profile
(see Figure 5.3 discussed later). The first relevant time instance at the mentioned
profile is the start of the input transition, but there are cells and events where
the amplitude becomes negative for a short period prior to this point . After the
determination of all these points, a reference profile is piece-wise aligned (scaled in
time and amplitude) to cover the intended coordinates for the according condition.
In addition to the current profiles, the timing parameters in terms of the propagation delay and the output signal slew rate are determined. This is a similar
procedure, where the available timing parameters in the library are interpolated.
These parameters are important for the current profile composition of entire modules, as well as for the determination of further characteristics, since the output
signal is synonymous with the input signal of subsequent cells.
5.1.3. Current Profile Composition
Given that the current profiles are calculated sequentially for each event, a composition to the profile for the entire circuit is necessary. Since the signal propagation
5.1. Current Profile Calculation
63
delays are available for all events, the single event profiles ik (t) can be superposed
to the overall profile i(t) with the appropriate time offsets τk :
X
i(t) =
ik (t − τk )
(5.4)
k
Figure 5.3 demonstrates this procedure for a simple circuit with one buffer connected to the input, followed by two paths with two buffers in series each. All
instantiated cells have the driver strength one (B1), except buf4 has twice the
strength (B2). Since buffer cells simply propagate the input signal to the output,
and given that the circuit input is triggered by a rising edge, all cells perform a
low-high transition at all ports.
buf1
B1
buf2
buf3
B1
B1
buf4
buf5
B2
B1
cell
name
buf1
buf2
buf3
buf4
buf5
event
propagation
time[ps]
delay[ps]
00.0
38.8
73.8
38.8
70.8
38.8
35.0
30.9
32.0
29.7
(a) circuit schematic and activity timing
0.5
single event
entire circuit
current [mA]
0.4
0.3
0.2
0.1
0
−0.1
0
20
40
60
80
100
time [ps]
120
140
160
180
200
(b) current profiles of the single events and the entire circuit
Figure 5.3.: Composition of the current profile for a given activity for the entire
circuit by superposing the single event profiles with the appropriate
time offset.
64
5. Netlist Based Current Modeling
The table in Figure 5.3(a) shows the time instances, when the input of each
buffer is triggered, and the according signal propagation delays from the input to
the output. Given that buf1 is directly driven by the circuit input signal (event at
0 ps), it represents the first part of the profile in Figure 5.3(b). Since buf2 and buf4
are connected to the output of buf1, both are triggered at the same time (38.8 ps),
and the associated current profiles consequently start at this time instance. Given
that buf4 has a stronger driver, its current peak value is higher, and the propagation
delay (32 ps) is shorter than the 35 ps of buf2. As a result, buf5 is triggered some
picoseconds earlier than buf3.
This example also shows that the transient behavior, and consequently the current consumption, highly depend on the circuit configuration. As buf1 has a higher
output load, the propagation delay is virtually 150% of the delay of buf5, and also
the current profiles are considerably different, even if the cell types are similar.
5.2. Circuit Simulation Methods
Characterizing the transient behavior of a digital system is generally done by analyzing the switching activities of the instantiated cells. The efficiency of the simulation methods are therefore given by the cell activity determination strategy.
The most common method is based on the evaluation of given stimuli vectors,
which are applied at the module inputs. As a result, the internal activity of the
circuit is determined by processing the signal transitions, which are given by the
stimuli pattern and the respective cell functionalities. Since the determination
of feasible stimuli vectors is exceedingly difficult for complex designs, especially
if they consist of storage elements like memories, an alternative approach is introduced. Given that the activity of programmable systems finally depends on
the implemented software, the number of possible operating sequences is almost
unmanageable. Therefore, an approach for a randomly distributed assignment of
switching events is more feasible. The preferences of the pattern based method and
the random activity interpretation approach are discussed in the following sections.
5.2.1. Pattern Based Simulation
The application and evaluation of stimuli patterns at the module inputs prearranges
the activity of a given circuit. As digital modules are basically organized in paths
of consecutively switching cells, the transient events are propagated from one cell
to another. The particular activity is determined by processing its input events
with the according cell functionality. Consequently, resulting output transitions
are propagated to thereon connected cells. As a result, the intended transitions are
successively evaluated and propagated to the involved cells. Therefore, this method
is also called an event-triggered simulation procedure. Since the signal timing char-
5.2. Circuit Simulation Methods
65
acteristics are also determined for the actual transitions, all required parameters are
available for the library based current profile calculation. As repeatedly mentioned,
this method is most accurate, but inapplicable for complex designs.
5.2.2. Random Activity Interpretation
As the determination of the actual activity of complex modules and programmable
systems is impractical in many cases, an approach for the interpretation of randomly assigned events is more feasible. Therefore, the parameter activity level is
introduced. It defines the percentage of active cells per clock period. According to
this parameter, an appropriate number of cells are randomly selected, and supposed
to become active at a given point in time.
Considering that a full random assignment of events may lead to unrealistic
conditions, as possibly all selected cells are intended to become active at almost
the same time, the activity distribution is necessarily regulated. Therefore, the logic
depths of the cells within a circuit are determined. The depth of a cell basically
represents the distance, in terms of the number of gates, between the input of the
enclosing module and the respective cell. As the random activity assignment is done
by selecting the intentionally active cells for each group according to the activity
level, an appropriate distribution of the switching events over time is achieved.
Since the current profile calculation, discussed in Section 5.1, is done separately
for each transition, the randomly specified events can be directly processed. Given
that this method provides a determination of the current consumption in a disordered manner, it can be performed unrelated to the paths of consecutively switching
cells. Considering that the time instance of the events is important for the overall
profile composition, an appropriate timing estimation is done in the course of the
initial analysis of the circuit.
As a result, the time instances, where the cell ports are supposed to become
active, are determined, and the single event profiles for randomly estimated events
can be composed to a profile for the entire circuit with the appropriate time offset.
More detailed considerations for the logic depths determination, the random activity assignment, and the timing estimation, are discussed in the following sections.
Logic Depths Assignment
As mentioned before, the logic depth is an indicator for the number of transitions
prior to a given cell becomes active. Figure 5.4 shows a part of the schematic
view of a module with three inputs and a fraction of the internal circuit. As the
module ports are considered as logic depth zero (LD=0), and given that the depth
is incremented from the input to the output at each cell by one, the output of the
shown cell1 is associated to LD=1. Consequently, the net at the output of cell2
represents LD=2.
66
5. Netlist Based Current Modeling
Figure 5.4.: Assignment of the logic depths to the cell ports in a module.
The connection properties of cell3 instantly show that the depth assignment is
possibly ambivalent. While one input is associated to LD=1, the other one is
directly connected to a module port (LD=0). Since the consideration of all possibilities is unmanageable for large circuits, a careful assignment policy is necessary
for such ambivalent situations.
The easiest method to handle such conditions, is to consistently assign the lower
or higher depth, but this leads consequently to unrealistic results. Given that processing the lower depths leads to an exceptionally high number of cells per group,
assigning the higher one results in a segmentation in a lot of depths with only a few
cell references per group. Even though the logic depth is only a structural parameter, and provides no direct relation to the switching event timing, an appropriate
determination is expedient. Otherwise, the previously mentioned approaches, concerning the consistent assignment of the higher or lower depth, may lead to an
unrealistic concentration of events at a short time period. The reason for that is
the weak relation to the structure. A high number of cells per group, or many
groups consisting of only a few cell references, lead to a high degree of freedom
at the selection of active cells. As a result, a selection of an inappropriately high
number of cells with almost simultaneous activities is possible.
An efficient approach to achieve a balance between the structural segmentation
and the number of cells per depth is a randomly distributed assignment. Therefore,
the logic depth of the output (LDout ) of a given cell with n inputs is determined
by incrementing the depth of a randomly selected input (LDin ):
LDout = rand(LDin {1, ..., n}) + 1
(5.5)
Random Activity Determination
Since this method is unrelated to applied stimuli patterns at the module ports, the
internal activity of the circuit is intentionally assigned. As previously discussed,
the cells in a module are associated to groups of cells for each logic depth. Under
5.2. Circuit Simulation Methods
67
Figure 5.5.: Illustration of the random activity distribution method. Intentionally
active cells are selected according to the given activity level of 25%
for each depth. The outputs of these cells are supposed to perform a
low-high or high-low transition, which is consequently distributed to
the inputs of the connected cells.
the assumption that the probability of a triggering event is equal for all cells in one
group, the selection of potentially active cells is randomly distributed.
Given that the activity of a module nevertheless depends on the functionality of
the circuit and on the events at the ports of the instantiated cells, the parameter
activity level is introduced. It defines the percentage of active cells for a given
circuit. Depending on the application as well as the type of a given module, the
actual activity of most circuits is usually within the range of 15%–35%. Exceptional
cases are typically processor cores with considerably higher, and peripheral units
with possibly lower activity levels.
Figure 5.5 shows the internal structure of a module and the groups of associated
cells for each logic depth. It illustrates the random selection of the potentially
active cells for a given activity level of 25%. As the switching events are preferably
distributed over the entire module, the selection is separately done for each group.
This leads to one active element per depth for the shown example. Generally, the
selected cell references (R) consist of subsets (r) of all cells (C), where N references
are extracted for a given depth:
R = {r|r ⊆ C, |r| = N }
(5.6)
68
5. Netlist Based Current Modeling
As N is probably different for each depth, it is calculated by N = A · NC (LD),
where A is the activity level, and NC (LD) is the number of associated cells at the
according depth.
As the active cells are randomly distributed, there are no paths of consecutively
switching cells, and the appropriate events have to be randomly determined as well.
Therefore, the outputs of the supposed active cells are intended to perform a rising
or falling edge. These events are consequently propagated to the inputs of thereon
connected cells. As for instance the shown cell 2c is supposed to be active, the
output event is assigned to the appropriate input of the cells 2b, 3b, and 3c.
The current profile calculation is finally done for the cells which are triggered
by the propagated events at the appropriate input. Since the initial state of these
cells is also unknown, this parameter is assigned randomly as well. The state (S)
is therefore determined for the given activity A and the active port (P) by:
S = S(rand({1, ..., m}, A, P )
(5.7)
Timing Parameter Estimation
As a consequence of the randomly distributed activity, the event timings cannot be
calculated for consecutive events. Therefore, the time instances, when a cell possibly
becomes active, have to be estimated. Given that the timing characteristics of a cell
depend on the activity, but the actual events are unknown, the propagation delays
(tpd ) and signal rise/fall times (tr and tf ) are determined for randomly estimated
transitions. As the signal slew rates are typically different for rising and falling
edges, both transitions have to be considered. Due to the dependence of the results
on the triggered port P , as well as on the steady state prior to the transition S,
the timing parameters are determined by a two-dimensional random selection:
tr,f,pd = tr,f,pd [rand(P {1, ..., n}, S{1, ..., m})]
(5.8)
This procedure is done similar to the pattern based simulation, starting at the
module inputs along the paths of consecutive cells. Therefore, all effecting parameters, in terms of the circuit environment and the output signal characteristics of
previous cells, are considered. The time instances, when a port of a given cell supposedly becomes active, is calculated by an accumulation of the appropriate signal
propagation delays. As a result, all important timing characteristics are available
for the discussed current profile calculation for randomly assigned activities.
5.3. Modeling of Complex Modules
As repeatedly mentioned, the stimuli determination for a pattern based simulation
is an expensive procedure for complex modules. On the other hand, a fully random
5.3. Modeling of Complex Modules
69
activity estimation provides many degrees of freedom and may lead to unrealistic
results under certain conditions. As discussed in Chapter 3, a synchronous sequential design consists basically of a clock subsystem and a combinational logic
part. Given that the characteristics of these two parts are significantly different, a
differentiated handling of them is expedient.
Since the clock signal characteristics are usually known, but the combinational
logic activity has to be estimated, an appropriate combination of the pattern based
simulation method and the random activity approach is applied. This composite
approach enables a detailed analysis of the clock subsystem and an efficient characterization of the combinational logic behavior in terms of a significant reduction
of the simulation effort, compared to the evaluation of multiple stimuli vectors.
As illustrated in Figure 5.6, the simulation flow consists of two branches in this
case, one for the clock paths and another one for the combinational logic. In both
branches the current profile calculation is based on the library of pre-characterized
cells, but the switching activity is determined by a pattern evaluation for the clock
paths, and a random activity estimation for the combinational logic.
clock distribution
network
module netlist
combinational logic
clock pattern
evaluation
standard cell
characterization
random activity
estimation
current [A]
current profile
calculation
current profile
calculation
cell library
0.4
0.2
0
0
0.5
1
1.5
2
time [ns]
2.5
3
3.5
4
Figure 5.6.: Simulation flow for large modules. Partitioning the module enables
the application of a pattern based simulation for the clock subsystem,
and the random activity estimation approach for the combinational
logic. The current profile calculation is based on the library of precharacterized cells in both branches.
70
5. Netlist Based Current Modeling
5.3.1. Module Partitioning
As the ports of a module provide different functionalities, the internal paths of consecutively switching cells, starting at the module inputs, are used to distribute the
appropriate signals to the particular functional units. Partitioning a module means
in this case a categorization of the path types, depending on the expected input
signal characteristics, as well as the mode of operation of the connected functional
unit. This provides the opportunity to employ efficient simulation methods for the
respective module parts. Therefore, following path categories are detected:
system clock: At least one input of synchronous sequential circuits is intended to
be connected to the system clock. This signal is typically distributed to the
storage elements by a clock tree. Therefore, these paths generally terminate
at the clock input of flip-flops and memories, or at the enable port of latches.
data input: Most input ports of a module are typically used to apply data values.
These ports are usually connected directly to the according functional unit.
set/reset: The set and reset signals are distributed from the respective module
ports to the clocked storage elements to bring the system into a specified
state. These signals are typically asynchronous to the system clock and only
active in exceptional cases, such as the system power-on procedure.
scan test: Special types of flip-flops have additional inputs providing the feature
to form scan test chains. Given that especially complex designs typically
provide the opportunity to form several chains, there exist also a couple of
scan enable and scan input ports.
Analog to the introduced standard cell characterization in Chapter 4, set/reset and
scan test paths are supposed to be inactive during the normal operating mode of a
system. Therefore, these ports are initially set to an appropriate constant value to
disable the according features. Since constant signals cannot trigger any switching
activity, these paths are disregarded during the calculation of the dynamic current
consumption. For this reason, set/reset and scan test paths are not referred anymore in the following sections. As a result, only the clock distribution network, the
remaining possibly active combinational logic parts, and the storage elements have
to be analyzed for characterizing the dynamic current consumption of synchronous
sequential system modules.
Figure 5.7 gives an overview of the finally considered module partitions. The
basic structure of the shown picture is similar to the introduced state machines in
Chapter 3. The typically largest partition is the combinational logic, stimulated by
the data inputs and the storage element outputs. Its results are basically the input
values of the storage elements and the module outputs, but some of these signals
are also used to configure the clock tree. Due to the special mode of operation of
5.3. Modeling of Complex Modules
71
Figure 5.7.: Illustration of the module partitioning procedure. It is differentiated
between the combinational logic part, the clock tree, and the clocked
storage elements.
the clock tree, in terms of the usually well-known clock signal characteristics and
the effective clock gating, it is important to consider this partition separately.
Special care has to be taken on the clocked storage elements. Even though they
are considered as a separate partition, the behavior of these cells highly depends
on the clock tree outputs, but also on the characteristics of the combinational logic
results. The clock inputs of the storage elements are consequently associated to the
clock subsystem, and the characterization of the data input and output behavior
analysis is related to the combinational logic activity.
5.3.2. Clock Subsystem
As a result of the module partitioning, the clock subsystem is isolated from the
rest of the circuit. The primary intention of the clock tree is the distribution
of the clock signal to the storage elements, and the system clock characteristics
are usually known. In this case, a pattern based simulation method is applicable
for this part of a circuit. The current profile calculation is consequently done by
evaluating the actual cell activities according to the clock signal specifications and
the particular cell functions. Clock trees usually consist of buffer cells, but may
also be implemented using inverters. Therefore, and given that a part of the clock
tree is possibly inverted, an evaluation of the individual cell functions is expedient.
72
5. Netlist Based Current Modeling
Since clock gating is a very popular and effective measure to reduce unnecessary
switching activities, it is essential to consider it in the course of modeling the
current consumption of a system. In extensively optimized systems, the clock
signal distribution to virtually all storage elements is possibly controlled by gating
cells. Hence, the percentage of actually active clock subsystem cells is theoretically
in the range of 0% to 100%. The enable signals for the instantiated gating cells are
predominantly controlled internally by the combinational part of a module. As the
activity of the combinational part is randomly estimated, the actually ”open” gates
have to be randomly evaluated as well. Therefore, the parameter gating level is
introduced. It defines the percentage of disabled storage elements. According to this
parameter, the control inputs of randomly selected gating cells are appropriately
set. This prevents the distribution of the clock signal to the intentionally affected
part of the clock subsystem. As a result, the storage elements in the subtree of
deactivated gating cells are not triggered by the clock signal and consequently
cause no dynamic current flow.
Given that clock gating is potentially implemented hierarchically, the number
of active gating cells cannot be assumed as proportional to the number of active
storage elements. Therefore, gating cells have appropriate weights, which are given
by the according number of connected storage elements. The determination of
the active gates is therefore an iterative procedure, where the gates are randomly
selected, until the given gating level is achieved. As the gating level is inverse
proportional to the number of triggered storage elements, it is also termed clock
activity level in the following sections.
The plots in Figure 5.8 show the current profiles for the clock subsystem of a
microcontroller core module. The analyzed module consists of more than 150.000
cells, where approximately 14.000 of them are flip-flops, 700 clock buffers, and 100
gating cells. The simulated clock activity levels are 33%, 66%, and 100%, where
the simulation has been repeated several times for each of these levels. As the
initialization of the gating cells, and therefore the determination of deactivated
subtrees has been done independently for each simulation run, the set of curves
for each activity level shows minor variations. To demonstrate the effects of gating
on the respective parts of the clock subsystem, the current profiles for the entire
module are plotted in Figure 5.8(a), and the partial results for the clock tree and the
storage elements in (b) and (c). It is shown that lower activity levels significantly
reduce the current consumption of the flip-flops, but the amplitudes of the clock
tree profiles are proportionally reduced as well.
Clock trees are typically dimensioned to ensure a preferably similar clock signal
arrival time at the storage elements. Therefore, the number of buffers at each subtree, as well as the respective driver strengths, are almost equivalent. As a result,
Figure 5.8(b) shows that the variances of the results for the same activity level are
quite small, even though the gating cells have been re-initialized for each simulation
run. Each profile consequently shows the result for randomly selected clock paths.
5.3. Modeling of Complex Modules
73
2.5
100%
66%
33%
current [A]
2
1.5
1
0.5
0
0
50
100
150
200
250
300
time [ps]
350
400
450
500
(a) entire clock subsystem
2
2
100%
66%
33%
1.5
current [A]
current [A]
1.5
100%
66%
33%
1
0.5
0
1
0.5
0
0
100
200
time [ps]
(b) clock tree
300
400
100
200
300
time [ps]
400
500
(c) storage elements
Figure 5.8.: Current profiles for the clock subsystem of a module with approximately 14.000 flip-flops, 700 clock buffers, and 100 gating cells. The
plots show the results for different clock activity levels, where the simulations have been repeated several times for each configuration.
74
5. Netlist Based Current Modeling
1.4
100%
66%
33%
0%
1.2
current [A]
1
0.8
0.6
0.4
0.2
0
−0.2
0
50
100
150
200
time [ps]
250
300
350
Figure 5.9.: Current profiles for a given number of clocked storage elements with
different percentages of elements performing an output transition.
The profiles in Figure 5.8(c) additionally demonstrate the reason for the variations at the clock subsystem. Due to the analysis of randomly selected clock
paths, different types of storage elements are triggered at any simulation run. It
can be seen that the module apparently consists of flip-flops with different driver
strengths, as the variation of the clock tree profiles for each parameter set is insignificant, while the storage element results show observable differences. The effect
of such variations on the quality of the resulting current consumption model for an
entire system is discussed later.
5.3.3. Clocked Storage Elements
The analysis of the clocked storage element behavior is related to the characteristics of the other module partitions. As the clock inputs are directly connected to
the clock tree outputs, this part of the storage elements is associated to the clock
subsystem and already considered (see Section 5.3.2). The storage element data
outputs trigger the combinational logic cells, and the number of switching outputs
needs to be derived from the designated percentage of active logic gates. As it
cannot be assumed that clock gating is implemented in such a way that all storage
elements with inactive outputs are disabled, only a part of the triggered elements
possibly change their output state. Therefore, an activity level, defining the percentage of elements performing an output transition in consequence of a triggering
clock event, is determined. The value of this parameter is given by the relation of
the gating level and the activity of the combinational logic cells.
5.3. Modeling of Complex Modules
75
Figure 5.9 shows the current profiles for a given number of storage elements
with different output activity levels. The plotted profiles are the simulation results
for: all outputs active (100%), 66%, 33%, and the case where no element changes
its output state (0%) as a reference. It is shown that switching outputs cause a
significantly higher current consumption compared to the result for stable data
values. The simulations have been repeated several times for each configuration,
where the supposed active cells have been randomly selected. It can be shown that
this approach is feasible, as the variance of the results are insignificant.
The additional current consumption, caused by storage element output transitions, compared to the profiles for stable outputs, is plotted in Figure 5.10. The
resulting average current consumption values for the analyzed circuit are listed in
Table 5.1. It can be observed that clocked storage elements with output transitions
cause a current consumption that is possibly more than three times the value of
inactive ones. Therefore, it is important to accurately determine the percentage of
active storage elements.
1.2
100%
66%
33%
0%
1
current [A]
0.8
0.6
0.4
0.2
0
−0.2
0
50
100
150
200
time [ps]
250
300
350
Figure 5.10.: Current consumption caused by output transitions with different activity levels, compared to the profile for triggered clock inputs only.
Table 5.1.: Average current consumption distribution at clocked storage elements.
output activity
clock trigger
output transition
overall current
0%
129.3 mA
0.0 mA
129.3 mA
33%
129.3 mA
89.1 mA
218.4 mA
66%
129.3 mA
178.3 mA
307.6 mA
100%
129.3 mA
269.0 mA
398.3 mA
76
5. Netlist Based Current Modeling
3
30%
20%
10%
2.5
current [A]
2
1.5
1
0.5
0
−0.5
0
200
400
600
800
time [ps]
1000
1200
1400
1600
Figure 5.11.: Current profiles for the combinational logic part of a given module.
Simulation results are based on the random activity interpretation
method and shown for different activity levels.
5.3.4. Combinational Logic
The largest partition of a module is typically the combinational logic block. Due
to the size and complexity of this circuit part, the random activity interpretation,
introduced in Section 5.2.2, is a feasible method to determine the respective current
profiles. Given that the clock subsystem is modeled by evaluating the actual clock
signal, the timing characteristics of all clocked storage element outputs are already
calculated. As a result, the starting points in time for each root node of the internal
logic paths are available. The determination of the time instance, when a particular
combinational cell possibly becomes active, is consequently based on reliable values.
In addition to the clocked storage element outputs, also the data input ports of
a module have to be considered as root nodes of combinational logic paths. As
these ports are possibly stimulated by an external circuit, but the activity timing
cannot be determined on the basis of the analyzed module netlist only, it has to be
estimated or provided as a parameter.
Figure 5.11 shows the resulting current profiles for the combinational part of the
mentioned microcontroller core unit. Based on the random activity interpretation
method, the simulation has been repeated several times for the activity levels 10%,
20%, and 30%. It is shown that, even though the supposed active cells as well
as their particular activity is randomly determined, the variation of the results
for the respective parameter set is insignificant. The characteristics of the shown
current profiles are typical for this kind of circuit. As a consequence of the virtually
simultaneous switching activity of the clocked storage elements, the first part of
the profiles show a relatively high peak with a considerably short rising time. The
5.3. Modeling of Complex Modules
77
2.5
clock distribution
storage elements
combinational logic
current [A]
2
1.5
1
0.5
0
0
500
1000
1500
2000
2500
time [ps]
Figure 5.12.: Current profiles for the clock distribution network, the clocked storage
elements, and the combinational logic part of a given module.
combinational logic cells, which are directly connected to flip-flips or latches, are
in this case stimulated at a similar time, and adding the respective current profiles
of this considerably high number of logic cells leads to this peak. The remaining
profile parts show a continuously decreasing current consumption, since the results
of the logic operations are gradually available. At approximately 1.2 ns appears
another observably increased current flow, as the analyzed circuit consists of a
memory block with a delay of approximately 1 ns. The availability of the memory
output values triggers some additional logic activity in this case. The very last
profile part shows the current consumption caused by the paths with the longest
cell delays and/or the largest number of consecutively switching cells, also referred
to as the critical paths.
5.3.5. Profile Composition
As the current profiles for the different partitions of a module are determined separately, these partial results have to be combined to the results for an entire module.
Figure 5.12 shows these partial profiles for the analyzed module, which has been
simulated with the following parameters: the clock gating cells are configured such
that 40% of the storage element outputs are enabled, where 50% of them actually change their output state. Due to the discussed dependencies, this leads to
a combinational logic activity level of 20%. The plotted profiles demonstrate the
contribution of the consecutively triggered circuit parts during one clock period,
starting with the rising clock edge, followed by the falling edge at approximately
2 ns in this example.
78
5. Netlist Based Current Modeling
Given that all clocked storage elements are high-active in this example, i.e. sensitive on the rising edge, the combinational logic cells are only active during the
high-phase, while the clock tree and the storage element inputs are triggered by
both clock edges. The amplitudes of the partial profiles also show the relation of the
number of active cells per partition. As one partition output typically drives several cell inputs, the partial profiles start typically with a peak that is considerably
higher than the according profile of the driving partition.
The final current consumption profiles for the analyzed module given by the sum
of the partial profiles are plotted in Figure 5.13. This method is applicable to
determine the typical or average characteristics of a circuit, but also to analyze the
profile variations as a result of the random stimulation of possibly different active
circuit parts. To demonstrate these variations, the results of several simulation
runs with identical parameters are shown.
2.5
entire module
current [A]
2
1.5
1
0.5
0
0
500
1000
1500
2000
2500
time [ps]
Figure 5.13.: Resulting current profiles for the analyzed module.
For several reasons the frequency spectrum is often more important than the
time domain waveform. Therefore, the results of a Discrete Fourier Transform
(DFT) are discussed and shown in the following paragraphs. For an introduction
of the Fourier transform method, as well as the most important relations of time
domain signal characteristics and the respective frequency spectrum waveforms, see
Appendix A.
It has been shown that the profile determination is based on random approaches,
but the resulting waveforms do not vary significantly for a given activity level. As
the profiles are additionally supposed to be periodic, the discrete Fourier transform
is actually used to approximate the Fourier series. The following figures therefore
show the magnitudes of the complex Fourier coefficients and not the power-density
spectra as usually done for random signals.
5.3. Modeling of Complex Modules
79
As the complex Fourier series is defined as
xp (t) =
∞
X
ck ej2πkf0 t ,
(5.9)
k=−∞
and given that exactly one period of the sampled time-domain signal is transformed
using the DFT, the complex Fourier coefficients are approximated by
1
X[k].
N
ck ≈
(5.10)
where X[k] is the DFT-spectrum, and N denotes the number of samples per period.
Since the discussed profiles are real signals, a Fourier decomposition in form of
x(t) = A0 +
∞
X
Ak cos(2πkf0 t + αk )
(5.11)
k=1
is also possible, and the DC value A0 , as well as the amplitudes Ak and the phases
αk of the harmonics can be approximated as follows:
A0 ≈
1
X[0],
N
Ak ≈
2
|X[k]|,
N
αk ≈ ∠X[k].
(5.12)
Figure 5.14 consequently shows the magnitude spectrum in form of the discrete
Fourier coefficients of the clocked partition (clock tree and storage elements) profiles, and in the second plot, the respective combinational logic partition results.
The previously discussed profiles can be roughly approximated with triangle-shaped
waveforms. As such pulses coincide with squared Sinc functions in the frequency
domain, the basic shape of the FFT results also show these characteristics.
Compared to the frequency domain results of the combinational logic partition,
where the squared Sinc function is relatively obvious, the characteristic of the transformed clocked partition profile is more complex. The respective time domain profiles actually consist of two pulses with different pulse widths and peak values. As
mentioned before, the most significant difference between the current consumption
caused by the different clock events, is the additional storage element output activity at the high phase of the clock signal. Hence, the pulses can be substituted
by identical profiles for both clock edges, and an additional pulse for the difference
between the rising and falling edge profiles. This results in one signal consisting
of two identical subsequent pulses, and another one with the original clock period.
As the time domain signals are assumed to be periodic anyway, the waveform with
the two pulses can be also considered as one pulse with half of the clock period.
Especially the coefficients for the clocked partition profiles in Figure 5.14 visualize
this characteristic at lower frequencies. It shows a transformed periodic signal with
the base frequency of 250 MHz, where the magnitudes of the harmonics of 500 MHz
80
5. Netlist Based Current Modeling
100
clocked partitions
current [mA]
80
60
40
20
0
0
0.5
1
1.5
2
2.5
3
frequency [GHz]
3.5
4
4.5
5
200
combinational logic
current [mA]
150
100
50
0
0
0.5
1
1.5
2
2.5
3
frequency [GHz]
3.5
4
4.5
5
Figure 5.14.: Magnitudes of the complex Fourier coefficients of the current profiles
for the clocked partitions and the combinational logic.
5.3. Modeling of Complex Modules
81
250
entire module
current [mA]
200
150
100
50
0
0
0.5
1
1.5
2
2.5
3
frequency [GHz]
3.5
4
4.5
5
Figure 5.15.: Fourier coefficient magnitudes of the current profiles for the entire
module.
are significantly higher than at the odd multiples of the base frequency. Due to the
actually different cell timing characteristics for rising and falling edges, the profile
peaks are not exactly equidistant. For this reason, the values at higher frequencies
show a different characteristic.
The Fourier coefficients of the current profiles for the entire module is shown in
Figure 5.15. It basically represents the sum of the previously shown partial results.
Here only the magnitudes are plotted.
As the results of several simulation runs are plotted, it can be shown that the
variation of the magnitudes for each frequency is relatively marginal. This leads to
the result that the current profile characteristics depend only secondarily on which
particular cells are actually triggered. Primarily important are the structure of the
module and the respective activity levels, as well as the clock signal characteristics.
5.3.6. Multiple Clock Domains
Complex designs usually consist of multiple functional units, which possibly operate
at different clock frequencies. While the operating time base of core modules is
typically the fastest one, peripheral functionalities are triggered at a fraction of the
system clock rate in many cases.
As discussed in Chapter 3 one of the most important parameters in the aim of
determining the current profile for a module with multiple clock domains is the
duty cycle ratio. Given that different methods are possibly used to derive a slower
time base from the fast system clock signal, the respective characteristics of the
5. Netlist Based Current Modeling
signal
82
1
200 MHz
50% duty
2
100 MHz
25% duty
3
100 MHz
50% duty
4
100 MHz
75% duty
0
1
2
3
4
5
time [ns]
6
7
8
9
10
2.5
25%
50%
75%
current [A]
2
1.5
1
0.5
0
0
1
2
3
4
5
time [ns]
6
7
8
9
10
Figure 5.16.: System clock signal and potential results of a clock divider by factor
two, and the effects on the current profiles.
resulting signals are important. Figure 5.16 shows a common clock signal and
three signals with half of this frequency, but different duty cycle ratios (25, 50, and
75%). These signals are potential results of a clock division by factor two, where
the resulting ratios depend on the actual divider type. The easiest method, and
therefore most commonly implemented, is a divider which simply disables every
second clock period, resulting in a signal with a duty cycle ratio of 25% (plotted
signal 2). Other methods may result in signals with equal lengths of the high and
low phases (signal 3), or extended high phases to 75% of the resulting clock period
(signal 4).
The second plot in Figure 5.16 shows the effects of these different ratios on
the current profiles. As the system activities are aligned to the rising clock edge,
and all instantiated storage elements are sensitive on this event, the main current
peak is at the same time in all cases. On the other hand, the timing of the activity
5.3. Modeling of Complex Modules
83
100
25%
50%
75%
current [mA]
80
60
40
20
0
0
0.5
1
1.5
frequency [GHz]
2
2.5
3
Figure 5.17.: Envelopes of the Fourier coefficient magnitudes of a current profile
with different duty cycle rations.
triggered by the falling clock edge depends on the divider type and the system clock
characteristics. Given that the system clock frequency is 200 MHz in this example,
these current peaks appear at approximately 2.5, 5, and 7.5 ns, depending on the
respective duty cycle ratio. These differently located current peaks in the time
domain also have significant effects on the magnitudes of the respective Fourier
coefficients (see Figure 5.17). While the basic shape of the results is virtually
similar, the relation of the particular values considerably depends on the actual
clock characteristics.
For a system consisting of a core module and a peripheral unit, which is triggered
by half of the given system clock frequency of 200 MHz, the profile composition
and the respective Fourier coefficients are shown in Figure 5.18. The core module
in this example is the exemplified circuit for the partial profile determination in
the previous sections. The additional unit is about twice this size, consisting of
approximately 300.000 cell instances, but is considerably less active than the core
module. It is shown that the peripheral unit is only active at the first system clock
period (25% duty cycle ratio), while the core module is triggered by all clock events.
As the period of the profile for the entire system is given by the least common
multiple of all clock periods, the frequency domain results in Figure 5.18(b) show
discrete values at each multiple of 100 MHz.
The frequency domain result for the entire system shows significantly alternating
values at the even and odd multiples of the base frequency. Due to the double clock
frequency of the core module, it only contributes to the total spectrum at multiples
of 200 MHz. Consequently, every second value of the system result is equal to the
84
5. Netlist Based Current Modeling
peripheral result, while otherwise both modules contribute a regular value to the
coefficients of the current profiles for the entire system.
When analyzing extensively large and complex modules with multiple different
functional units, a clock divider may be embedded inside a module. In this case,
parts of this module are triggered by clock signals with different characteristics.
Given that gate-level netlists are in most cases flattened, i.e. no information concerning subcircuit hierarchies or a differentiation of the functional units is available,
an actual assignment of particular cells to a clock domain is not feasible.
As mentioned in the sections discussing the profile determination for the different
module partitions (Sections 5.3.2–5.3.4), the current profile amplitudes are almost
proportional to the activity level parameter. Hence, and under the assumption that
the percentage of cells associated to each clock domain is available, a proportionated
distribution of the according current profile is applicable. Such a distributed profile
for multiple clocks (imc ) can be accordingly composed by:
X
imc =
Ak ik (t)
(5.13)
k
where Ak with Ak < 1 denotes the relative contribution of associated cells to
the particular clock domain. As previously discussed and shown in the respective
figures, the module profiles (ik ) are given by the superposition of the profiles for
the rising and falling clock edges (ir and if ) at the appropriate time instances (τr
and τf ):
i(t) = ir (t − τr ) + if (t − τf )
(5.14)
5.3. Modeling of Complex Modules
85
6
core module
peripheral unit
system result
current [A]
5
4
3
2
1
0
0
1
2
3
4
5
time [ns]
6
7
8
9
10
300
core module
peripheral unit
system result
current [mA]
250
200
150
100
50
0
0
0.5
1
1.5
frequency [GHz]
2
2.5
3
Figure 5.18.: Current profiles (a) and Fourier coefficient magnitudes (b) of a system consisting of two modules stimulated by different clock frequency
domains.
6. Parasitic Effected Current
Modeling
The current profile determination discussed in the previous sections has been done
for ideal conditions in terms of lossless cell interconnections and a constant power
supply voltage. As introduced in Chapter 2, real power distribution networks and
cell interconnect wires cause a significant effect on the system behavior. Given
that wire loads in deep submicron technologies are almost in the range of the cell
input loads, it is essentially important to consider these effects. One effect of
real interconnects is a considerably higher total amount of current consumption
due to the additionally demanded current for charging and discharging the wire
capacitances. These additional loads at the driving cell outputs also lead to reduced
signal slew rates, and consequently to higher signal propagation delays.
Power distribution networks affect the system performance in a different manner,
but the effects are similar to that of interconnect wires. Real power supply systems
can be modeled as a network of resistances, capacitances, and optional inductances.
A current flow through such a network causes voltage drops at the supply pins of all
on-chip devices. As a consequence, the particular cells are temporarily operating
at a lower supply voltage, which additionally affects the operating performance.
Since gate-level netlists provide no information concerning the cell interconnect
wire characteristics and the power supply network, the mentioned effects have to
be estimated. Approaches for modeling these effects by post-processing the ideal
current profiles are discussed in the following sections.
6.1. General Principles
As mentioned before, interconnect wires and power supply systems cause significant effects on the system performance and consequently lead to modified current
consumption profile characteristics. While interconnects introduce additional loads
to driving cells, power distribution networks cause voltage drops. But both cause a
reduction of the switching performance of the respective components. The effects
on the current consumption profile is therefore similar.
Figure 6.1 therefore shows the simulation results for an example circuit with
lossless interconnects, compared to the analysis including an appropriate wire load
model. Both waveforms in each plot are SPICE simulation results for the switching
88
6. Parasitic Effected Current Modeling
activity that is triggered by a rising clock edge. The ideal waveform is without
any wire model, and the other one includes the extracted parasitic resistances and
capacitances from the according layout data.
signal voltage [V]
1.5
ideal
extracted
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
time [ns]
0.6
0.7
0.8
0.9
1
(a) data signal transitions
ideal
extracted
current [mA]
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
time [ns]
0.6
0.7
0.8
0.9
1
(b) current profiles
Figure 6.1.: Simulation results showing the effects of the parasitic elements of interconnect wires on data signals (a) and current profiles (b).
Figure 6.1(a) shows that the signal slew rates are significantly reduced due to
the additional loads at the cell output drivers. Since this causes also higher signal propagation delays, the activity of the particular cells is accordingly delayed.
As a result, this reduced performance of the switching on-chip devices leads to a
considerably stretched current profile, plotted in Figure 6.1(b).
6.1. General Principles
89
It can be also observed, that even though the mean current is increased, the
maximum peak values are reduced due to the distribution of the partial profiles
along the time axis. Given that the analyzed module is relatively small, an additional effect is demonstrated by the visible rounded peaks. As the resistances and
capacitances of the applied wire models form a kind of low-pass filter, actually rapid
switching activities are consequently restricted to moderate signal transitions. The
resulting profiles therefore show considerably less striking current peaks.
The Fourier transform results of the mentioned profiles are shown in Figure 6.2.
As the system clock period is 4 ns in this example, the fundamental frequency is
250 MHz. The plots show at a first glance major differences between the ideal
and the extracted coefficient envelope. But even though the amplitudes and the
positions of the local maxima and minima along the frequency axis show significant
differences, the basic characteristics are nevertheless similar.
0.35
ideal
extracted
0.3
current [mA]
0.25
0.2
0.15
0.1
0.05
0
0
1000
2000
3000
4000
frequency [MHz]
5000
6000
7000
Figure 6.2.: Envelope of the Fourier coefficients of the current profiles for the ideal
circuit, compared to the result including the parasitic elements of a
wire load model.
Note also that the DC value which corresponds to the mean current consumption, is higher for the profiles containing the parasitic effects. This can be simply
explained by the additional current consumption of the parasitic elements, such as
for charging/discharging the wire capacitances.
It can be seen that both coefficient envelope waveforms have similar local maxima
and minima, but at different frequencies. Note that stretching a signal in the timedomain results in a compression of the frequency spectrum. Figure 6.3 therefore
compares the simulation result for the circuit including the extracted parasitic
elements with a stretched (in time domain) and scaled initially ideal profile. It
90
6. Parasitic Effected Current Modeling
can be seen that the two spectra coincide up to 2 GHz. Today’s integrated circuit
emission models are usually demanded to be valid up to at least 1 GHz. Therefore
this simple post-processing method of stretching the current profile in time and
scaling the amplitude could be a good approximation to account for the effects
caused by cell interconnect wires and power distribution networks.
0.35
extracted
processed
0.3
current [mA]
0.25
0.2
0.15
0.1
0.05
0
0
1000
2000
3000
4000
frequency [MHz]
5000
6000
7000
Figure 6.3.: Comparison of the Fourier coefficient envelopes of the simulation result
including the parasitic effects and an initially processed ideal profile.
It can also be seen that the post-processed values beyond 2 GHz are higher
than the extracted simulation result. These differences could be caused by the
attenuating behavior of the RC-structures, which is effective at higher frequencies,
similar to a low-pass filter. The variations at even higher frequencies above 5 GHz
could, amongst others, be caused by the probably irregular wire structures, while
the applied post-processing supposes a constant effect on the switching behavior
of all cells. Even though this method is insufficient to cover such effects at high
frequencies, it still provides a good approximation of the expected behavior with a
feasible accuracy within a reasonable frequency range.
6.2. Cell Interconnects
As discussed in Chapter 2, and mentioned in the previous section, real interconnect
wires cause lower signal slew rates and consequently higher propagation delays,
but also an additional amount of current flow. Figure 6.4 shows the current profiles for the rising and falling clock edge of a significantly larger circuit than in the
previous section. The plots show a comparison of the simulation results for ideal
interconnects and an equivalent circuit including models representing the real wire
6.2. Cell Interconnects
91
properties. The respective parasitic elements are extracted from the according circuit layout data. It can be seen that the storage elements are sensitive on the rising
edge, as there is also some combinational activity, while the falling edge triggers
only the clocked elements, and therefore considerably less current is consumed.
160
160
ideal
extracted
120
120
100
100
80
60
80
60
40
40
20
20
0
0
0
100
200
time [ps]
(a) rising edge
300
400
ideal
extracted
140
current [mA]
current [mA]
140
0
100
200
time [ps]
300
400
(b) falling edge
Figure 6.4.: Current profiles for a rising (a) and falling (b) clock edge for a given
circuit. The plots show the profiles for ideal interconnects, compared
to the simulation results including wire models.
The previously discussed approaches have been applied to approximate the parasitic effected profiles for this circuit example. The additional cell delays are therefore
modeled by a profile stretching in time, and the demanded capacitance charge/discharge currents are considered by an appropriate amplitude scaling.
A separate application of this procedure to the partial profiles for both clock
edges results in the Fourier coefficient envelopes shown in Figure 6.5. As expected,
both plots show significant differences between the ideal and the extracted circuit.
But the post-processing of the ideal profiles results in a good approximation of the
extracted circuit waveforms, even if there are some uncertainties in the results for
the rising edge. This could be primarily given by the fact that the current profile
characteristics in Figure 6.4(a) show some differences in the first profile part. Such
differences in the waveforms are probably caused by nets with significantly surpassing wire loads, leading to an exceptionally affected performance of the respective
driving cells. Hence, the activity of such function blocks is more delayed, and consequently effects the resulting profile shape. On the other hand, the clock subsystem
is typically well-balanced. As a result, the profiles for the ideal and the extracted
92
6. Parasitic Effected Current Modeling
circuit are almost similar. The uncertainties of the post-processed profiles are in
this case minor (see Figure 6.5(b)).
10
ideal
extracted
processed
current [mA]
8
6
4
2
0
0
500
1000
1500
frequency [MHz]
2000
2500
(a) rising edge
4
ideal
extracted
processed
current [mA]
3
2
1
0
0
500
1000
1500
frequency [MHz]
2000
2500
(b) falling edge
Figure 6.5.: Post-processing result compared to the coefficients of the ideal and
extracted circuit for the rising (a) and the falling (b) clock edge.
The final result for the complete period with both clock edges is plotted in Figure 6.6. As the basic components of the profile are two current peaks in a first
approximation, the plots show the characteristic magnitude alternation. It can be
observed that the processed values with the higher amplitudes almost match the
simulation results up to several GHz. The sequence of lower magnitudes shows on
the other hand considerable differences. One reason for that could be the instanti-
6.3. Power Distribution Networks
93
ideal
extracted
processed
12
current [mA]
10
8
6
4
2
0
0
500
1000
1500
2000 2500 3000
frequency [MHz]
3500
4000
4500
5000
Figure 6.6.: Comparison of the simulation result including the parasitic effects and
an initially processed ideal profile.
ation of clock buffers that are possibly optimized for the rising edges. In this case,
the absolute value of the wire load is the same for both types of driving transistors, but the relative effect is considerably less significant for the typically stronger
p-type transistors. The delays of the current peaks introduced by the wire loads
would therefore cause different effects for the rising and the falling edge, and consequently lead to such uncertainties in the frequency domain waveform. But since
these current consumption models are typically used to identify the most dominant
frequencies, the envelope of the higher values is important. And it can be observed
in Figure 6.6 that these results are reasonably accurate for a frequency range up to
several GHz.
6.3. Power Distribution Networks
There are several effects on the system performance and therefore also the current
consumption profiles, that are caused by the power distribution systems. As introduced in Chapter 2, there are at least resistances and capacitances that cause
voltage drops, but there are also significant inductive effects, which become more
and more important due to technology shrinking.
Analysis of test-chips, such as the one introduced in [35], have shown that the
introduced method to post-process ideal current profiles is possibly also applicable
for an estimation of the parasitic power distribution network effects. The plots in
Figure 6.7 show the simulation results of a test-chip module in an ideal environment,
with the extracted interconnect wire parasitics, and the additionally modeled power
94
6. Parasitic Effected Current Modeling
0.8
ideal
interconnects
power supply
0.7
current [A]
0.6
0.5
0.4
0.3
0.2
0.1
0
0
200
400
600
800
1000 1200
time [ps]
1400
1600
1800
2000
Figure 6.7.: Current profiles for ideal conditions, including wire models for the cell
interconnects, and an additionally modeled power supply system.
lines. It is shown that the amplitudes of the parasitic effected current profiles
are significantly reduced. Both voltage drops and decoupling capacitances play a
significant role in this case. While voltage drops are responsible for a considerably
reduced switching performance of the on-chip devices, the decoupling capacitances
are providing a portion of its charge at the initial phase of the switching activities.
This leads to the comparatively low amplitude, but there is also a significantly
long phase of current flow after the maximum peak value. During this time, the
decoupling capacitances are re-charged, but also all other nodes in a high state are
pulled up to the nominal power supply voltage.
Figure 6.8 shows the Fourier transforms of all the mentioned current profiles,
including the result of the post-processed ideal profile. In this example, all the onchip effects in terms of interconnect wires and power distribution network parasitics
are considered to approximate the respective simulation result.
Given that appropriate parameters are available for the post-processing, it can be
observed that the results are acceptable for the relatively regular structures on the
analyzed test-chips. But for complex systems with several millions of transistors,
wich possibly consist of numerous sub-systems and various function units, the power
grid is typically much more irregular. Such systems probably introduce additional
effects that cannot be reliably covered by this method.
Tools, such as EXPO [11] are intentionally developed to model the effects of
complex power distribution networks. Here, the power grid is segmented to consider
the possibly different structures of the respective system partitions, and modeled
by an appropriate network of resistances, capacitances, and even inductances.
6.3. Power Distribution Networks
95
5
ideal
interconnects
power supply
processed
current [mA]
4
3
2
1
0
0
2
4
6
8
10
frequency [GHz]
12
14
16
Figure 6.8.: Envelope of the Fourier coefficients of the simulation results for ideal
conditions, including wire models for the cell interconnects, and an
additionally modeled power supply system, compared to the processed
ideal profile to approximate all parasitic effects.
7. Implementation and Verification
The introduced methods to model the current consumption of digital integrated
circuits based on gate-level netlists have been implemented in software and basically verified by SPICE based transistor-level simulations. The most important
implementation considerations and the finally achieved current consumption profile
quality are discussed for some particular examples in this chapter.
Since gate-level circuit descriptions do not provide any information about the
internal structure of the instantiated cells, a standard cell library has been characterized by applying the method introduced in Chapter 4. The considerations and
implemented data structures of the generated library of pre-characterized cells, as
well as the finally provided parameters, are introduced in Section 7.1.
The capabilities of the discussed methods to model the current consumption
profiles, discussed in Chapter 5, are verified by the evaluation of the results for
various circuits with different characteristics. The pattern based simulation method
has therefore been applied for relatively small circuits. And the feasibility of the
modeling approach for complex designs is discussed for appropriate modules, such
as the core unit of a high-end microcontroller.
As chip-level current consumption models typically consists of current sources
and several passive elements, the finally resulting profiles are converted to so-called
equivalent current sources. These conclusive tasks are discussed in the last sections
of this chapter.
7.1. Library of Characterized Cells
A library of characterized cells actually consists of two parts: the static properties
that are given by the internal cell structure, and the dynamic operating behavior
in terms of timing characteristics and current consumption profiles.
7.1.1. Static Properties
As discussed in the previous chapters, the dynamic behavior of a cell significantly
depends on the characteristics of the connected cells in the circuit environment.
Therefore, the following parameters, representing the most important characteristics of the internal cell structure, are extracted from the transistor-level cell descriptions and provided as the static cell properties part of the library:
98
7. Implementation and Verification
cell name: Given that any cell in a library has a unique name, all cell instances in
a gate-level netlist can be identified by its name.
cell type: There are different types of cells, such as combinational gates, flip-flops,
memories, and latches. The cell type is important for the circuit partitioning, as storage elements are for instance considered different to combinational
gates. But also the data structure of the dynamic characteristics representation depends on the cell type. The internal activity of flip-flops is for instance
effected by the output state, whereas the behavior of combinational gates
only depends on the input signals.
As mentioned before, the switching capabilities of all on-chip devices is primarily
given by the respective driver strengths and output loads. But there are some
additionally required characteristics to specify the properties of a cell. Following
attributes are therefore provided for the particular cell ports:
port name: Cell interconnections are specified in netlists by the association of the
respective port names (see Section 7.2.1).
direction: As gate-level netlists provide no information concerning the respective
port directions (input, output, or bidirectional), this parameter is necessary
to determine the actual circuit structure.
port properties: One of the most important parameters for the dynamic behavior
of a cell are the transistor sizes of the driving cell and the respective output
loads (see Section 4.4). The port properties are therefore converted to appropriate equivalent inverter parameters, representing the load characteristics of
an input, or in case of an output the properties of the driving transistors.
function: As a cell may have more than one output, such as the sum and the carry
of a full-adder, a function in terms of a truth table is associated to any cell
output. For a full-adder, with the data inputs A and B, the carry-in Ci,
and the outputs sum S and carry-out Co, the truth table is shown in the
following table. By defining the structure of the table, the input states can
be generated, and it is sufficient to store the respective output vectors.
A 0 1 0 1 0 1 0 1
B 0 0 1 1 0 0 1 1
Ci 0 0 0 0 1 1 1 1
S 0 1 1 0 1 0 0 1
Co 0 0 0 1 0 1 1 1
The implemented data structure of the static cell properties is similar to an array
of cell prototypes with the mentioned parameters. As any cell can be identified by
its name, all the other properties are organized in an associated record that can be
copied and instantiated in the simulation environment.
7.1. Library of Characterized Cells
99
7.1.2. Dynamic Characteristics
This part of the library is intended to provide all required parameters to enable
a determination of the timing characteristics and the current consumption profiles
for a given event at any characterized cell. Therefore, all cells have been simulated with a reasonable set of transitions in different circuit environments. The
parameters, describing the respective timing characteristics and current profiles,
are subsequently extracted from these single cell simulation results, and stored in
an appropriate data structure.
Timing Characteristics
As discussed in Chapter 4, the behavior of a cell significantly depends on the input
signal characteristics. Since the signal rise and fall times are typically different,
both parameters are extracted. And given that the output signals are synonymous
to the input signals of the connected cells, the output slew rates are important as
well. For the introduced method, to determine single event current profiles, which
are subsequently superposed, the delay between the input event and the output
response is also required. These signal propagation delays are therefore extracted
for all outputs. As a result, the library provides the following timing characteristics:
• input and output signal slew rates
• signal propagation delays
Current Profiles
It has been shown in Section 5.1 that the current profile shapes are quite similar for
a given event. An alignment of a reference profile to a set of characteristic points
is consequently a feasible approach to determine the actual waveform for various
environmental conditions. Hence, it is sufficient to provide the following data:
• time instances, when the current profile amplitude is zero, and the time and
amplitudes of the local profile minima and maxima
• value of the mean current consumption
• reference profiles
Library Implementation
All these previously mentioned parameters are provided for all relevant port events.
As the characterization simulations can be done for some environmental conditions
only, an efficient method to allocate and interpolate the respective data sets is
required. The finally implemented data structure is illustrated in Figure 7.1.
100
7. Implementation and Verification
list of characterized cells
port transitions
reference current
profiles
value
value
value
value
value
value
value
value
value
value
value
parameters
parameters
parameters
parameters
parameters
Figure 7.1.: Data structure of the dynamic characteristics and profiles in the library
of characterized cells.
Starting at the list of characterized cells, there is a reference to the available
port transitions for the particular cells. The characteristic parameters and the
reference profiles are organized in a tree structure and associated to the respective
transitions. Figure 7.1 for instance shows these references for the third cell in the
list and for the second transition. The node values at the different levels in the
parameter tree represent one of the cell environment parameters, such as the input
signal characteristics and the output load. The transition timing and current profile
parameters for the respective conditions are finally located at the leaf nodes. As
there are only a few reference profiles per transition, these values are stored in a
separate branch of the tree.
One reason for this library structure is the support of an efficient interpolation
procedure for the particular parameters. Provided that the node parameters are
sorted at any level in the tree, the requested values can always be determined by
an interpolation of two neighbored values. The timing characteristics and profile
parameters are therefore determined by a recursive interpolation of the data that
are referenced by two neighbored nodes.
7.2. Circuit Description Interpretation
101
7.2. Circuit Description Interpretation
As the introduced modeling method is intended to be applied as early as possible
in the design flow, it is based on the circuit description in form of Verilog gate-level
netlists. The structure and syntax of such a gate-level description is for instance
specified in the respective standard [36]. After the netlist import and circuit interpretation, it is necessary that the processed data are represented in an appropriate
structure, to enable an efficient performance of the applied simulation methods.
7.2.1. Verilog Gate-Level Netlist Import
Listing 7.1 shows for instance the gate-level netlist of a basic 3-bit counter in Verilog
syntax. In this hardware description language any circuit is called a module and
identified by a module name and the list of ports. The particular port directions are
specified by the following statements, declaring them as an input, output, or even
as a bidirectional inout port. The internal circuit of a module is subsequently described by a consecutive list of submodule or cell instances. Each of these instances
is specified by the submodule name, which is synonymous to the instantiated cell
type. The following identifiers are the instance name and a list of the submodule
or cell ports with the respective names of the connected nets.
Listing 7.1: Verilog netlist of a 3-bit counter.
module counter_3bit ( clk , en , out_a , out_b , out_c );
input clk , en ;
output out_a , out_b , out_c ;
R150_EXOR
R150_EXOR
R150_EXOR
R150_AND
R150_DFF
R150_DFF
R150_DFF
XOR0
XOR1
XOR2
AND0
DFF0
DFF0
DFF0
(. A ( out_a ) , . B ( en ) , . Z ( in_a ));
(. A ( out_b ) , . B ( out_a ) , . Z ( in_b ));
(. A ( out_c ) , . B ( and_a ) , . Z ( in_c ));
(. A ( out_a ) , . B ( out_b ) , . Z ( and_a ));
(. CP ( clk ) , . D ( in_a ) , Q ( out_a ));
(. CP ( clk ) , . D ( in_b ) , Q ( out_b ));
(. CP ( clk ) , . D ( in_c ) , Q ( out_c ));
endmodule
Given that the order of the cell instances, but also the order of the names in
the port lists are arbitrary, the cell interconnections are solely described by the
associated port and net names. These names are unique and define that all ports
that are associated to a given net are interconnected. Hence, a gate-level netlist is
actually a textual description of the schematic view of a circuit, as it is shown in
Figure 7.2 for the exemplified counter.
102
7. Implementation and Verification
Figure 7.2.: Schematic view of a 3-bit counter.
The netlist import is therefore done line by line and temporarily organized in a
linked list of cell instances. The list nodes are therefore represented by records with
the respective cell attributes that are provided by the circuit description. As the
particular cells are instances of black boxes, no details on the port directions and
cell characteristics are provided by the imported netlist. But because the cell types
are unique, additional parameters can be imported from the static properties part
of the pre-characterized cell library. With this merged information, the particular
cell attributes and port interconnections are sufficiently specified. As a result,
the imported data can be reorganized into a tree structure that is similar to the
schematic view or the circuit. The interconnect lines are consequently implemented
as references (pointer) to the respective cell data objects, as it is shown in Figure 7.3
for the discussed counter.
It is additionally shown that any output of a cell instance consists of an array
with the references to the connected cells. The particular ports of the connected
cells are also known by the driving cell (e.g. p(D) is a pointer to the port D of the
referenced cell). As the module inputs are considered similarly, it is finally possible
to execute the paths of consecutively switching cells by following the respective
references.
7.2.2. Path Categorization
The modeling approach for complex modules has been discussed in Chapter 5. It
has been shown that the introduced methods are most effective when a module is
partitioned, and appropriate simulation methods are applied for the respective circuit parts. The paths of successively connected cells are therefore categorized. This
is done by analyzing the paths in terms of evaluating the types and the particularly
connected ports of all cells that are accessible by the respective path.
7.2. Circuit Description Interpretation
inputs
103
XOR0:R150_XOR
DFF0:R150_DFF
XOR1:R150_XOR
DFF1:R150_DFF
AND0:R150_AND
XOR2:R150_XOR
DFF2:R150_DFF
Figure 7.3.: Data structure representation of a 3-bit counter.
Following path types are therefore determined and added to the properties of the
respective root node (module input):
set/reset: The asynchronous system set and reset signals are generally distributed
to the respective storage element inputs in synchronous designs. There may
be a certain tree of buffers to distribute the load of possibly numerous ports
to a reasonable number of driving buffers. This kind of path can generally be
considered as terminated at storage element inputs.
scan test: A scan test is usually done by reconfiguring the paths of a circuit to
form so-called scan chains. As only special flip-flops support this feature, the
scan enable and test input signals are typically distributed from the respective
module inputs to such scan flip-flops.
clock: As mentioned in the previous chapters, the clock signal is typically distributed by a tree structure with possibly several gating cells to the clock
inputs of all storage elements.
combinational: All paths that cannot be definitely assigned to one of the above
categories, are considered as general combinational logic path.
104
7. Implementation and Verification
Determining these path categories enables the application of reasonable simulation methods for the respective path types. As the set/reset and scan test paths are
considered as inactive during normal system operation, these module input ports
are consequently set to the according constant state to disable these features. The
configuration of the circuit is subsequently done by distributing these static values
from the respective module inputs to the intentionally disabled cells by evaluating
the functions of the cells that are associated to such a path.
As a result, these circuit parts are considered as inactive, and the possibly involved gating cells, multiplexers, and flip-flops are configured to operate in a normal
mode. In addition to the configuration, a path categorization is required for the
partitioning of the circuit into the clock distribution network, the storage elements,
and the combinational logic blocks (see Section 5.3.1).
7.3. Pattern Based Simulation
In this most common simulation method stimuli vectors are applied at the module
inputs and propagated along the paths of consecutively switching cells according
to the internal circuit functions. The switching activities of the particular cells are
consequently given by the input stimuli events and the particular cell functions.
7.3.1. Software Implementation
As mentioned before, the implemented data structures are similar to the schematic
view of the imported circuit (see Figures 7.2 and 7.3). Since this current profile
calculation method is applicable for single events, the circuit analysis is sequentially
done for one cell after the other. Starting at the module inputs, the triggered events
are propagated along the paths of consecutively switching cells. The current profiles
and timing characteristics are subsequently determined for the respective events.
Given that the environmental circuit conditions are required to address the appropriate parameters in the library, the attributes of connected neighbor cells are
analyzed, and the appropriate parameters are determined. The output load properties are therefore determined by an evaluation of the connected (referenced) cell
inputs. The respective port properties are subsequently combined to a set of equivalent inverter parameters, as introduced in Section 4.3. These parameters, describing
the total output load, are provided to all referenced cells. With the additionally distributed characteristics of the driving cell output, all relevant parameters describing
the circuit environment are consequently determined.
As a result, all required parameters for the addressing and interpolation of the
respective data, provided by the library of pre-characterized cells, are available. The
determination of the current profiles and the timing characteristics can consequently
be done according to the introduced method.
7.3. Pattern Based Simulation
105
out13
BUF11
BUF12
BUF13
out43
BUF41
BUF42
BUF43
out90
in
BUF00
BUF10
BUF20
BUF30
BUF40
BUF50
BUF60
BUF70
BUF80
BUF90
out63
out33
BUF31
BUF32
BUF61
BUF62
BUF63
BUF33
Figure 7.4.: Schematic view of a simple circuit consisting of several buffer cells.
Since the switching behavior of a cell highly depends on the signal slew rates, but
the actual transition timing can only be approximately estimated with the properties of the directly connected cells, the calculated output signal characteristics
are additionally distributed to the triggered cells. This enables an adaption of the
respective parameters to fit the particular input characteristics and consequently
leads to more accurate results.
The current profiles for the particular events are finally superposed. The appropriate time instance, when the particular profiles are superposed, are determined
by the accumulation of the respective signal propagation delays.
7.3.2. Simulation Results
Figure 7.4 shows the schematic view of a simple circuit consisting of several buffer
cells. The profiles, determined by the application of the pattern-based method,
are compared to the respective SPICE simulation results in Figure 7.5, where the
results are plotted for both a rising and a falling input signal transition.
It can be observed, that even there are some minor local differences in the profile
waveform, the accuracy of the determined event timings and the current consumption per transition are almost congruent with the SPICE simulation results. A
comparison of the profile lengths shows that the propagation delay uncertainties
are in a negligible range of less than one picosecond.
As the analyzed circuit consists of buffer cell instances only, all cells are triggered
to perform the same operation. In case of a rising input event, all cells are consequently forced to drive a low-high transition, where the PMOS transistors connected
to the output become conductive and pull the respective net towards VDD. On the
other hand, the NMOS transistors become active in case of high-low transitions.
Since both events are characterized with the same loads (see Chapter 4), but the
dimensions of the complementary transistors in a cell are usually different (PMOS
106
7. Implementation and Verification
0.6
0.5
0.4
0.3
0.2
SPICE simulated
gate−level modeled
0.1
0
−0.1
0
50
100
150
200
250
time [ps]
300
350
400
450
350
400
450
(a) rising edge
0.6
0.5
current [mA]
0.4
0.3
0.2
SPICE simulated
gate−level modeled
0.1
0
−0.1
0
50
100
150
200
250
time [ps]
300
(b) falling edge
Figure 7.5.: Current consumption profiles for a structure consisting of several buffer
cells, that has been triggered by a rising and a falling input signal edge.
The gate-level modeled profiles are compared to the respective SPICE
based simulation results.
7.4. Complex Module Modeling
107
is typically larger than NMOS), the actual circuit properties differently match the
characterized conditions. It can be observed in the plots, that the PMOS dimensions of the instantiated buffers are similar to the characterized conditions in the
library, as the rising edge profiles show for this example less uncertainties than the
falling edge profiles. The library parameters consequently are interpolated over a
wider range in case of falling input signal transitions. But since these uncertainties
are limited to very short peaks, and the total current consumption is almost exactly
determined, the results can in both cases be considered as sufficiently accurate.
7.4. Complex Module Modeling
The current profile determination for large and complex modules is most efficient
when different methods are applied for the particular circuit partitions (see Section 5.3). Therefore, the imported circuit description and the already interpreted
data need to be initialized accordingly. The profile determination is subsequently
done by the application of an appropriate simulation method for the respective
circuit parts.
7.4.1. Partitions Initialization
As different methods are applied for the analysis of complex designs, an appropriate
initialization of the respective partitions is necessary. While the clock distribution
network is initialized by propagating the clock signal along the clock tree paths, the
particular events at the inputs of the cells that are associated to a combinational
logic block are randomly assigned. The storage elements are of special interest, as
their outputs represent the boundary between a clocked partition and a combinational logic block.
The switching activity and the time instances, when a particular cell becomes
active, are therefore initialized according to the considerations introduced in Section 5.2.2. All gating cells are configured as enabled, and the clock signal is propagated from the respective module inputs to all storage elements, which are also
forced to change their output state. Therefore, it is necessary that the module
ports are categorized and the clock inputs are properly identified (see Section 7.2.2).
Given that numerous combinational paths are typically also starting at the module
inputs, these ports are initialized to become active at a similar time as the ones
connected to flip-flop outputs. Since the actual event timings at these ports are
unknown, the particular values are intentionally set within a certain range. Based
on the determined time instances, when the root nodes of the combinational logic
paths possibly become active, all other cells along these paths are initialized for the
subsequently applied random activity based method.
108
7. Implementation and Verification
7.4.2. Current Profile Determination
Provided that the clock signal pattern is known, the clocked partitions are modeled by applying the pattern based simulation method discussed in Section 7.3.
The configuration of the supposed active gating cells is therefore randomly done,
according to the given parameter for the designated clock activity level. For an
efficient configuration of the clock tree, an array providing the references to the
particular gates is generated and globally accessible. As mentioned before, the
determination of the active clock paths is done prior to each simulation run, and
the current profiles are subsequently determined for the events given by the clock
pattern and the particular cell functions along the activated paths.
On the other hand, the current profiles for the combinational logic partitions
are modeled by applying the introduced random activity interpretation method.
A two-dimensional array with references to the particular cells associated to the
respective logic depths is therefore generated and also globally accessible. The
current profiles for the intentionally active cells are subsequently determined for
the randomly assigned switching events. The selection and initialization of the
active cells and events is also done prior to each simulation run.
Due to the evaluation of the particular cell delays in course of the partitions
initialization, all partial profiles are associated to the respective event timings. As
a result, the composition of the current profiles for an entire module is finally done
by a superposition of all the partial results at the given time instances.
7.4.3. Simulation Results
Figure 7.6 shows the current profiles for the clocked subsystem, representing the
clock distribution paths with all the gating cells and the clock triggered storage
elements. The analyzed circuit consists of approximately 2.500 clock buffers and
16.000 storage elements (14.500 flip-flops and 1.500 latches). To show the accuracy
of the introduced method, the modeled profiles for the clock tree, the storage elements, as well as the sum of both, representing the entire clock subsystem of the
analyzed model, are compared to the respective SPICE simulation results. The circuit has therefore been exported as a netlist in SPICE syntax including all required
stimuli sources, such as the gating cell control signals, as well as the data inputs
and enable signals of the relevant flip-flops and latches.
It can be seen that the modeled profiles almost match the transistor-level SPICE
simulation results. Particularly the clock tree waveforms are practically congruent.
Given that the module consists of a multiple number of cells, compared to the
circuit discussed in 7.3, the minor uncertainties of the single event profiles are
almost compensated. The partial results for the storage elements show on the other
hand some differences, which are primarily caused by the reason that the particular
activities, stimulated at the SPICE simulation and the randomly determined events,
7.4. Complex Module Modeling
109
2
SPICE simulated
gate−level modeled
current [A]
1.5
1
0.5
0
0
100
200
300
time [ps]
400
500
600
(a) rising edge
2
SPICE simulated
gate−level modeled
current [A]
1.5
1
0.5
0
0
100
200
300
time [ps]
400
500
600
(b) falling edge
Figure 7.6.: Comparison of the SPICE based transistor-level simulation results and
the modeled profiles for a clock subsystem of a microcontroller core
unit. The profiles are shown for the clock signal distribution paths
(including numerous gating cells), the clocked storage elements, and
the sum of both components.
110
7. Implementation and Verification
are probably different. The input stimuli of all flip-flops and latches are constrained
to avoid any output activity in both cases, but the cell behavior also depends on the
actual output state (see Chapter 4). A comparison of the results in Figure 7.6(a)
and (b) shows that the current flow for the storage elements is modeled with a
lower peak amplitude, but with some delayed activity for the rising edge, while the
falling edge timings are almost exactly determined. Provided that these differences
in the profiles are caused by the random flip-flop states mentioned above, such
uncertainties can be accepted.
For an entire module also including the combinational logic parts, the time domain current profiles and the according Fourier transformed results are shown in
Figure 7.7. As mentioned before, transistor-level simulations of large modules, consisting of several hundred thousand cells, demand excessively high computational
effort and memory. The modeled profiles are therefore verified by SPICE simulation results for designs with a moderate complexity of several thousand cells. The
cell counts of this module are: 237 clock buffers, 8 gating cells, 3167 flip-flops,
35 latches, and 13910 combinational cells. The simulated clock period is actually
10 ns (100 MHz) with equidistant rising an falling edges, but for a better view of
the profile details, the respective partial results are consecutively plotted, and the
time axis is truncated after the falling edge profiles.
It can be seen that the current profile parts, which are caused by the clock subsystem, almost correlate with the SPICE simulation result. Since the activity of the
combinational logic blocks are randomly determined, these parts of the waveform
show some minor differences. Therefore, the results of several simulation runs are
plotted, to show the uncertainties caused by the random selection of the triggered
cells and events, which are particularly observable at such a relatively small design.
The plotted Fourier coefficients in Figure 7.7(b) show, that there is also a good
correlation of the modeled behavior and the SPICE simulation result in frequency
domain. This example consequently shows that all important properties of the analyzed circuit are considered, and the introduced modeling approaches are feasible
to approximate the current profiles with acceptable accuracies.
7.5. Profile Post-Processing
For the composition of the partial results to the final profiles of entire modules,
but primarily for the post-processing of the ideal current profiles, a tool with the
required capabilities has been implemented to consider the parasitic effects of the
interconnect wires. The graphical user interface of this tool, providing several
options to test the effects of some important parameters, is shown in Figure 7.8.
The list in the upper left region of the program window shows the available
profiles found in the directory profile path. Pressing one of the buttons labeled
width add starts the routines to import the profiles from the respective file and plots
7.5. Profile Post-Processing
111
0.6
SPICE simulated
gate−level modeled
0.5
current [A]
0.4
0.3
0.2
0.1
0
−0.1
0
500
1000
1500
time [ps]
2000
2500
3000
(a) time domain profiles
14
SPICE simulated
gate−level modeled
12
current [mA]
10
8
6
4
2
0
0
0.5
1
1.5
2
2.5
3
frequency [GHz]
3.5
4
4.5
5
(b) frequency spectrum
Figure 7.7.: Time domain waveform and Fourier coefficients of the modeled current
profiles, compared to the according SPICE simulation result.
112
7. Implementation and Verification
Figure 7.8.: Graphical user interface of the implemented tool that features the superposing of the current profiles for different circuit modules and the
visualization of different post processing parameter effects.
the time domain waveform as well as the Fourier transform results at the embedded
figures. In addition to the control elements to scale the profile amplitude and to
stretch it in time, there are some options to test different clock signal characteristics.
The base frequency of the clock (f ), as well as an optional divider factor (clkdiv ),
and a possible duty cycle ratio (duty) can be individually set for each profile via
the respective elements.
As the profiles are internally handled as split into the parts that are related to
the rising and the falling clock edges, the reference system clock frequency can be
modified at any time. The checkbox inv furthermore provides the opportunity to
show the effects of an inverted clock (swapped rising and falling edge), which can
be individually set for any profile.
The resulting summary profile of all components that are marked as active can be
finally exported as an Equivalent Current Source (ECS). Depending on the target
application, the format of the exported data is selectable, and the interval between
the values (sample rate) is adjustable as well.
7.6. Chip-Level Current Consumption Models
113
7.6. Chip-Level Current Consumption Models
Complex integrated circuits typically consist of several subsystems, such as the processor core and several peripheral controller units of a microcontroller. A method to
consider the cell interconnect wires by post-processing the current profiles for ideal
conditions has been introduced in Chapter 6. It has also been discussed that the
effects of power distribution networks cause similar effects, which can be basically
modeled by the application of this method as well. The current profiles have therefore been determined for a module of a testchip with an integrated on-chip current
and voltage sensor. The results considering both, the interconnect wires and the
power distribution network, are verified by a comparison to the simulation results
of a model that has been approved in [35]. Figure 7.9 shows the post-processed
profile and the simulation result of the testchip model in frequency domain. It can
be seen that the characteristics of the spectrum can be well approximated by the
introduced method up to around 700 MHz for this example. At higher frequencies the simulation results show at least a similar trend, but the post-processed
waveform spectrum is more pessimistic.
100
SPICE simulated
post−processed
current [dBµA]
90
80
70
60
50
40
30
0
1000
2000
3000
4000
frequency [MHz]
5000
6000
7000
Figure 7.9.: Comparison of Fourier coefficient envelopes of the modeled current profile for a testchip and the simulation result of an on-chip sensor model
considering interconnect wire and power distribution network effects.
As the power supply networks are typically optimized for the respectively supplied subsystems, the actual chip partitioning becomes significantly important when
the complexity of the power grid increases. Given that these effects are in addition
probably irregular, thereon specialized tools, introducing sophisticated power network models, promise results with improved accuracies. Such an application is for
114
7. Implementation and Verification
instance the already mentioned EXPO [11], which is able to generate models that
are based on a pre-layout estimation of the power grid characteristics. It divides
the chip area into a scalable number of tiles and applies appropriate models for the
particular segments, depending on the respective circuit type. A circuit analysis,
based on extracted layout parameters can be for instance done using XcitePI [8].
All the mentioned tools which are applicable to model the behavior of complex
chips basically provide a network of passive elements, introducing the parasitic effects of the on-chip wire structures. But they are typically not capable to generate
the active chip model components in terms of the equivalent current sources representing the current consumption of the on-chip devices. The methods introduced
in this thesis to model the current profiles for the internal activity of the particular
circuit modules are consequently an essential component to generate comprehensive chip behavior models. The current profiles are modeled and exported as the
required equivalent current sources (see Section 7.5), and can subsequently be included into chip-level emission models.
8. Conclusion and Outlook
8.1. Conclusion
This thesis presents methods to efficiently model the dynamic behavior of digital
modules in terms of the transient current consumption waveforms. As the introduced methods are based on circuit descriptions that are usually available in early
design phases, the generated models are feasible for design studies to predict the
effects of design variations in early phases of the circuit design process.
As the model generation is based on gate-level netlists, a method to characterize
a standard cell library has been introduced. The characteristics of the different
cell types are presented, and approaches to minimize the required effort for the
characterization process, as well as the amount of necessarily stored data in the
library, are discussed. Since this procedure needs to be applied only once per
technology and library, but for all particular cells, it has been almost automated
by the implementation of scripts for the single cell simulations and the subsequent
parameter extraction. This procedure has been applied for a 130 nm technology,
but has also been approved for a 90 nm library.
Methods used to generate the current profiles for modules with different sizes and
complexities have been introduced. The profile determination is based on gate-level
netlists and the library of pre-characterized cells mentioned before. A partitioning
of the analyzed modules into the clocked circuit parts and the combinational logic
blocks also enables the application of properly matched methods to characterize the
dynamic circuit behavior in terms of the timing characteristics and the transient
current consumption waveforms. It has been shown that a pattern-based simulation method is typically the best choice for the circuit partitions that are triggered
by a specified system clock, and a random activity interpretation is most efficient
for large combinational logic blocks. Given that at least the effects of cell interconnect wires need to be considered at the transient current models, approaches for a
post-processing of the waveforms are introduced. The determined profiles for ideal
conditions (i.e. a constant power supply voltage and lossless wires) are therefore
manipulated to approximate these effects.
Considerations for the software implementation of a tool that is capable to determine the current profiles based on the introduced methods, as well as a verification
of the generated models for circuits with different complexities and characteristics,
are also presented. An additionally implemented tool furthermore provides several
116
8. Conclusion and Outlook
control elements that allow an investigation of alternative design parameters. Here,
different clocking schemes can be tested, but also a supplementary manipulation of
the profiles is possible to show the effects of modified parameters, as for instance the
activity levels of the respective design partitions or submodules. Such a parameter
variation is similar to traditional approaches based on an intuitive manipulation
of a given profile, and therefore comes with several uncertainties, but it allows an
almost instant and as a first approximation reasonable prediction of the effects on
the system behavior.
The export of the modeled profiles as equivalent current sources finally enables
a subsequent generation of the mentioned noise models for a given circuit [37].
As promised in the introductory section as the goals of this work, the models
can be generated before a layout is available, and they are also based on the actual
circuit for an appropriate consideration of the specific design characteristics. For an
efficient model generation in terms of good accuracies, but nevertheless demanding
also a low computational effort, the models are determined at a reasonably high
level. The introduced methods consequently allow a circuit characterization in less
than one hour, even for complex modules consisting of several millions of transistors,
on a standard Desktop PC (e.g. 2 GHz Pentium IV, 1 GB RAM).
It has also been discussed that the parasitic effects of on-chip wires lead amongst
others to a shift of the current profile frequency spectrum characteristics towards
lower frequencies. The introduced approaches using basic signal processing methods
are practical to generate models that are valid up to a few GHz for currently
applied technologies. The benefits of accurate emission models for component- or
even system-level simulations are shown in course of the MISEA project [38], and
the introduced methods have been optimized and approved for state of the art
technologies and circuits for automotive applications.
8.2. Outlook
Future technologies will supposedly introduce additional effects that probably have
to be considered as well. With a technology scaling to considerably smaller onchip structures, the leakage current is for instance expected to become significantly
higher compared to the dynamic current consumption caused by switching activities. On the other hand, faster switching transistors lead to higher signal slew rates
and shorter propagation delays. In this case, the inductance of the interconnect
wires probably will become significantly important and cannot be neglected anymore [39]. Given that the impact of on-chip wiring is expected to become more
significant due to reduced structure sizes and increased switching performance,
the model generation algorithms should be improved and adapted to upcoming
requirements. A probably also important topic is to look for approaches for an
almost automatic determination of appropriate profile post-processing parameters.
A. Fourier Transform Characteristics
Frequently, the frequency spectrum of a time domain signal or profile needs to be
analyzed. As it is the most commonly used method for a spectral analysis, this
chapter gives a basic overview of the Fourier transform. The definitions and for
this work most important properties and transform pairs are therefore summarized.
A derivation of the shown formulas, as well as a more comprehensive discussion of
the characteristics, can be found in the respective literature, as for instance [40, 41].
A.1. Definitions
Fourier has shown that any continuous periodic signal xp (t) can be represented as
a linear combination of properly chosen sine and cosine functions:
∞
xp (t) =
a0 X
+
[ak cos(2πkf0 t) + bk sin(2πkf0 t)]
2
k=1
(A.1)
where the coefficients ak and bk are the Fourier coefficients and the parameter f0 is
the fundamental frequency that is equal to the inverse period time T0 :
1
f0 =
(A.2)
T0
By a combination of the sine and cosine waves with the same frequency, the Fourier
series can be also written as:
∞
X
xp (t) = A0 +
Ak cos(2πkf0 t + αk )
(A.3)
k=1
A substitution of the sine and cosine functions using Euler’s formula leads to the
complex notation of the Fourier series:
xp (t) =
∞
X
ck ej2πkf0 t ,
(A.4)
k=−∞
where the coefficients ck are the complex Fourier coefficients. This equation is
also called the synthesis equation, whereas the analysis equation to determine the
coefficients ck is given by
Z T0 /2
1
xp (t)e−j2πkf0 t dt.
(A.5)
ck =
T0 −T0 /2
118
A. Fourier Transform Characteristics
The Fourier series is limited to periodic signals. For aperiodic signals the Fourier
transform is typically used:
Z ∞
x(t)e−j2πf t dt
(A.6a)
X(f ) =
−∞
The inverse transform is given by
Z
∞
x(t) =
X(f )ej2πf t df,
(A.6b)
−∞
where X(f ) is the Fourier transform of x(t), and x(t) is the inverse Fourier transform
of X(f ). This relation is often shown as:
x(t)
X(f )
(A.6c)
Given that signal processing is often done for sampled values, the discrete Fourier
transform is important. It is derived from the Fourier transform for time-continuous signals, allows the spectral analysis of time-discrete waveforms, and is given by:
X[k] =
N
−1
X
2π
x[n]e−jkn N ,
k = 0, 1, 2, 3..., N − 1
(A.7a)
n=0
N −1
2π
1 X
x[n] =
X[k]ejkn N ,
N k=0
n = 0, 1, 2, 3..., N − 1
(A.7b)
where X[k] is the discrete transform of x[n], and x[n] the inverse discrete transform
of X[k], which can again be written as the relation:
x[n]
X[k]
(A.7c)
The variables n and k are the indices of the time-domain and the frequency-domain
vectors, and N is the number of sampled values, which is equal in both domains.
A.2. Fourier Transform Properties
There are several properties of the Fourier transform, which are also important for
some approaches in this work. A multiplication of the time-domain amplitudes
with a given factor is for instance equivalent to a multiplication in the frequency
domain with the same factor. Scaling a signal in time leads on the other hand to a
compression of the spectrum on the frequency axis. The most important properties
are therefore shown as follows:
Linearity
k1 x1 (t) + k2 x2 (t)
k1 X1 (f ) + k2 X2 (f )
(A.8)
A.3. Transform Pairs
119
Time- and Frequency Scaling
x(kt)
f
1
X( )
|k| k
(A.9)
Shifting in time domain
X(f )e−j2πf t0
(A.10)
X(f − f0 )
(A.11)
h(t) ∗ x(t)
H(f ) · X(f )
(A.12a)
h(t) · x(t)
H(f ) ∗ X(f )
(A.12b)
x(t − t0 )
Shifting in frequency domain
x(t)ej2πf t0
Convolution
A.3. Transform Pairs
While transformed sinusoidal waves show exactly one value at the given frequency
in the spectrum1 , other waveforms are decomposed to the appropriate frequencies.
Some of these waveforms show specific functions in the frequency domain. The Delta Function shows for instance one single impulse in the time-domain signal, and
results in a spectrum where all frequencies are present with a constant magnitude:
x(t) = δ(f )
X(f ) = 1(f )
(A.13)
On the other hand, a constant time-domain signal results in a single value at the
frequency zero.
x(t) = 1(f )
X(f ) = δ(f )
(A.14)
For this work primarily important are the characteristics of the rectangle and
triangle waveforms (see Figure A.1). A rectangle signal is transformed to a Sinc
(also referred to as sin(x)/x) function. Since a triangle waveform is basically the
result of a convolution of two rectangle signals, and given that a convolution in
time-domain is equivalent to a multiplication in frequency domain, the respective
spectrum shows a squared Sinc function.
1
There is actually a second value with the same magnitude at the appropriate negative frequency,
but here are only the positive frequencies considered.
120
A. Fourier Transform Characteristics
1
amplitude
0.8
0.6
0.4
rectangle
triangle
0.2
0
−1
−0.8
−0.6
−0.4
−0.2
0
time [s]
0.2
0.4
0.6
0.8
1
(a) time domain waveform
0.1
rectangle
triangle
magnitude
0.08
0.06
0.04
0.02
0
0
1
2
3
frequency [Hz]
4
5
6
(b) frequency spectrum
Figure A.1.: Time domain waveform (a) and frequency spectrum (b) of a rectangle
and a triangle signal.
Bibliography
[1] D.E.C. Moehr. The legal situation on EMC in the European Union. In Proc.
International Symposium on Electromagnetic Compatibility, pages 702–705,
Tokyo, Japan, May 1999.
[2] T. Steinecke. Design-In for EMC on CMOS Large-Scale Integrated Circuits. In
Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 910–915,
Montreal, Canada, August 2001.
[3] C. Lochot and J.-L. Levant. ICEM: A new standard for EMC of IC, Definition
and Examples. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, volume 2, pages 892–897, August 2003.
[4] J.-L. Levant, M. Ramdani, and R. Perdriau. Solving Board-Level EMC Issues with the ICEM Model. In Proc. 5th Int. Workshop on Electromagnetic
Compatibility of Integrated Circuits, Munich, Germany, November 2005.
[5] M. Coenen and R. de Jager. Standardization for EMC IC Modeling. In Proc.
IEEE Int. Symposium on Electromagnetic Compatibility, pages 892–897, Istanbul, Turkey, August 2003.
[6] ANSI/EIA 656-B. I/O Buffer Information Specification (IBIS), Version 4.1.
GEIA, 2004.
[7] IEC 62014-3. Integrated Circuits Electrical Model (ICEM). IEC, 2002.
[8] E. Miersch, T. Steinecke, and M. Goekcen. Power Integrity Analysis of a Microcontroller (µC) plus its Chip Package (BGA). In Proc. 5th Int. Workshop on
Electromagnetic Compatibility of Integrated Circuits, pages 211–214, Munich,
Germany, November 2005.
[9] E. Sicard and G. Peres. A Novel Software Environment for Predicting the
Parasitic Emission of Integrated Circuits. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, Munich, Germany, November
2005.
[10] M. Badaroglu, G. Van der Plas, P. Wambacq, S. Donnay, G.G.E. Gielen, and
H.J. De Man. SWAN: High-Level Simulation Methodology for Digital Substrate Noise Generation. In IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, volume 14, pages 23–33, January 2006.
122
Bibliography
[11] D. Hesidenz and T. Steinecke. Chip-Package EMI Modeling and Simulation
Tool ”EXPO”. In Proc. 5th Int. Workshop on Electromagnetic Compatibility
of Integrated Circuits, pages 231–234, Munich, Germany, November 2005.
[12] T. Steinecke, H. Koehne, and M. Schmidt. Behavioral EMI Models of Complex
Digital VLSI Circuits. In Proc. IEEE Int. Symposium on Electromagnetic
Compatibility, pages 848–851, May 2003.
[13] Y. Tsividis. Operation and Modeling of The MOS Transistor. Oxford University Press, second edition, 1999.
[14] J. M. Raraey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits: A
Design Perspective. Prentice Hall, second edition, 2003.
[15] D. A. Hodges, H. G. Jackson, and R. A. Saleh. Analysis and Design of Digital
Integrated Circuits. McGraw-Hill, third edition, 2003.
[16] H. J. M. Veendrick. MOS ICs: From Basics to ASICs. VCH, 1992.
[17] N. S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin,
M. Kandemir, and V. Narayanan. Leakage Current: Moore’s Law Meets Static
Power. IEEE Computer, pages 68–75, 2003.
[18] R. J. Baker, H. W. Li, and D. E. Boyce. CMOS: Circuit Design, Layout, and
Simulation. Wiley-IEEE Press, 1998.
[19] N. H. E. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A
Systems Perspective. Weste, second edition, 1993.
[20] P. J. Ashenden. The Designer’s Guide to VHDL. Morgan Kaufmann, second
edition, 2002.
[21] D. J. Smith. VHDL & Verilog Compared & Contrasted. In Proc. 33rd Design
Automation Conference, pages 771–776, Las Vegas, USA, June 1996.
[22] C. Mead and L. Conway. Introduction to VLSI Systems. Addison-Wesley,
1980.
[23] D. G. Messerschmitt. Synchronization in Digital System Design. IEEE Journal
on Selected Areas in Communications, 8(8):1404–1419, October 1990.
[24] P. E. Gronowski, R. P. Preston W. J. Bowhill, M. K. Gowan, and R. L. Allmon. High-Performance Microprocessor Design. IEEE Journal of Solid-State
Circuits, 33(5):676–686, May 1998.
Bibliography
123
[25] V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic, and N. M. Nedovic.
Digital System Clocking: High-Performance and Low-Power Aspects. WileyIEEE Press, 2003.
[26] K. Yip. Clock tree distribution. IEEE Potentials, 16(2):11–14, April/May
1997.
[27] K. B. Hardin, J. T. Fessler, and D. R. Bush. Spread Spectrum Clock Generation for the Reduction of Radiated Emissions. In Proc. IEEE Int. Symposium
on Electromagnetic Compatibility, pages 227–231, Chicago, August 1994.
[28] S. Damphousse, K. Ouici, A. Rizki, and M. Mallison. All Digital Spread
Spectrum Clock Generator for EMI Reduction. IEEE Journal of Solid-State
Circuits, 42(1):145–150, January 2007.
[29] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh. Activity-Driven Clock Design for
Low Power Circuits. In Proc. IEEE/ACM Int. Conference on Computer-Aided
Design, pages 62–65, June 1995.
[30] A. H. Farrahi, C. Chen, A. Srivastava, G. Tellez, and M. Sarrafzadeh. ActivityDriven Clock Design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(6):705–714, June 2001.
[31] A. Chatzigeorgiou, S. Nikolaidis, and I. Tsoukalas. A Modeling Technique for
CMOS Gates. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 18:557–575, May 1999.
[32] J.M. Daga and Daniel Auvergne. A Comprehensive Delay Macro Modeling for
Submicrometer CMOS Logics. IEEE Journal of Solid-State Circuits, 34:42–55,
January 1999.
[33] P. Maurine, M. Rezzoug, and D. Auvergne. Output Transition Time Modeling
of CMOS Structures. In Proc. IEEE Int. Symposium on Circuits and Systems,
pages 363–366, Sydney, Australia, May 2001.
[34] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical
Recipes in C. Cambridge University Press, second edition, 1992.
[35] Jacek Kruppa and Dirk Hesidenz. High Speed, High Bandwidth On-Chip
Current and Voltage Sensor. In Proc. IEEE Sensors, pages 1337–1340, 2006.
[36] IEEE. IEEE Std 1364-2005 Standard for Verilog Hardware Description Language. IEEE, 2006.
124
Bibliography
[37] T. Steinecke, M. Goekcen, D. Hesidenz, and A. Gstöttner. High-Accuracy
Emission Simulation Models for VLSI Chips including Package and Printed
Circuit Board. In Proc. 8th Int. Workshop on Electromagnetic Compatibility
of Integrated Circuits, pages 41–46, Torino, Italy, November 2007.
[38] G. Steinmair, T. Steinecke, and R. Weigel. MISEA – Modelling of Integrated
Circuit Devices for Automotive EMC Simulation. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 202–205,
Munich, Germany, November 2005.
[39] Yehia Massoud and Yehea Ismail. Grasping the Impact of On-Chip Inductance.
In Proc. IEEE Circuits & Devices, pages 14–21, 2001.
[40] D. Ch. von Grünigen. Digitale Signalverarbeitung. Fachbuchverlag Leipzig,
second edition, 2002.
[41] S. W. Smith. Digital Signal Processing: A Practical Guide for Engineers and
Scientists. Newnes, second edition, 2002.
Publications
[42] A. Gstöttner, T. Steinecke, and M. Huemer. High Level Modeling of Dynamic Switching Currents in VLSI IC Modules. In Proc. 5th Int. Workshop on
Electromagnetic Compatibility of Integrated Circuits, pages 207–210, Munich,
Germany, November 2005.
[43] A. Gstöttner, T. Steinecke, and M. Huemer. Activity Based High Level Modeling of Dynamic Switching Currents in Digital IC Modules. In Proc. 17th
Int. Zurich Symposium on Electromagnetic Compatibility, pages 598–601, Singapore, February/March 2006.
[44] A. Gstöttner, T. Steinecke, and M. Huemer. Fast High Level Modeling Methods for Dynamic Switching Currents of Digital IC Modules. In Proc. Austrian
Conference on the Design of Integrated Circuits and Systems, pages 97–101,
Vienna, Austria, October 2006.
[45] A. Gstöttner and J. Kruppa. Modeling of Dynamic Switching Currents of
Digital VLSI IC Modules and Verification by On-Chip Measurement. In Proc.
18th Int. Zurich Symposium on Electromagnetic Compatibility, pages 1 – 4,
Munich, Germany, September 2007.
[46] A. Gstöttner and M. Huemer. Estimation of Current Profiles for Large Digital
VLSI Modules in Early Design Phases. In Proc. 8th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 87–90, Torino, Italy,
November 2007.
[47] T. Steinecke, M. Goekcen, D. Hesidenz, and A. Gstöttner. High-Accuracy
Emission Simulation Models for VLSI Chips including Package and Printed
Circuit Board. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 1 – 6, Honolulu, USA, July 2007.
Download