Gate-Level Current Modeling of Digital Integrated Circuits for

Gate-Level Current Modeling of Digital Integrated Circuits for Conducted Chip Emission Characterization Modellierung der Stromaufnahme digitaler integrierter Schaltungen auf Gatterebene zur Charakterisierung der leitungsgebundenen Chip-Emission Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades DOKTOR-INGENIEUR vorgelegt von Andreas GSTÖTTNER Erlangen - 2010 Als Dissertation genehmigt von der Technischen Fakultät der Universität Erlangen-Nürnberg Tag der Einreichung Tag der Promotion Dekan Berichterstatter : 01. Februar 2010 : 11. März 2010 : Prof. Dr.-Ing. Reinhard German : Prof. Dr.techn Mario Huemer Prof. Dr.-Ing. Klaus Helmreich Acknowledgement This work would not have been accomplished in that way without the help and support of several people to whom I want to express my sincere gratitude. First of all, I wish to thank Prof. Dr. Mario Huemer for being my research advisor and his guidance and support. Special thanks to Thomas Steinecke for consistently supporting me during all stages of this work, his organizational efforts, and the precious ideas and inspiring discussions. I am also very thankful to all colleagues at Infineon Technologies who contributed to this work, particularly Jack Kruppa and Mehmet Goekcen. Their full support and many interesting discussions turned out to be indispensable for the success of this thesis. I would like to thank Prof. Dr. Robert Weigel and all the colleagues at the Institute for Electronics Engineering for creating an extraordinary research environment and working atmosphere. Very special thanks to Florian Frank, my colleague at the MISEA project, not only for giving me insight to the swabian culture and many almost religious discussions about Linux. Furthermore, I would like to thank Ralf Mosshammer, Thomas Ussmüller, Alexander Kölpin, Benjamin Waldmann, Wolfgang Tobginski and Adrian Voinea for their mental support and motivation during the composition of the thesis. I want to express cordial thanks to my family for their continuous support, particularly to my parents which have been my first and most important teachers. Finally, I would like to thank Sandra for her patience and understanding. Abstract This thesis investigates novel methods to characterize the current profiles of complex digital very large scale integrated circuits. The possibly huge number of simultaneously switching transistors of complex digital devices causes significant current peaks, and therefore a considerable noise on the power supply lines. This may lead to interferences with other components of the system, but may also cause electromagnetic compatibility issues. Measures to eliminate this noise and to stabilize the supply voltage need therefore be implemented. This can be done at the printed circuit board or in the chip package, but the probably most efficient and economic approach is to place appropriately matched measures close to the respective noise sources on the chip. Since on-chip measures need to be integrated into the circuit design, simulation models are very helpful to identify the potential noise sources in early phases of the development process. Early simulations also enable studies to predict the effects of different design options. Such models typically consist of passive components, representing the properties of the on-chip wiring, and of active noise sources which model the transient current consumption of the respective components. In the focus of this thesis are high-level approaches to determine the dynamic behavior of digital integrated circuit designs and to generate noise models for early design studies. The introduced methods are based on gate-level circuit descriptions which are typically available after the circuit synthesis. This enables design analysis before the actual layout of the cell interconnect wires and the on-chip power distribution network are implemented. A library, providing parameters which describe the dynamic behavior of the particular cells in terms of the switching current waveform characteristics and signal transition timing information, is therefore characterized. For an efficient determination of the switching activities of the particular cells, complex circuits are partitioned, and a combined approach of a pattern-based simulation and a random activity interpretation, is introduced. As gate-level netlists do not provide any information concerning the on-chip wiring characteristics, approaches to approximate the parasitic effects of cell interconnect wires are discussed as well. Kurzfassung Diese Arbeit behandelt neuartige Methoden zur Charakterisierung des zeitlichen Verlaufs der Stromaufnahme von komplexen hoch-integrierten Schaltungen. Die oft hohe Zahl der gleichzeitig schaltenden Transistoren in komplexen digitalen Komponenten kann beträchtliche Stromspitzen und somit Störungen auf den Versorgungsleitungen verursachen. Das kann zur Beeinträchtigung benachbarter Systemkomponenten führen, aber auch Probleme mit der Elektromagnetischen Verträglichkeit hervorrufen. Maßnahmen zur Eliminierung dieser Störungen und zur Stabilisierung der Versorgungsspannung sind dazu erforderlich. Diese können auf der Leiterplatte oder im Gehäuse des Chips realisiert werden, der effizienteste und wirtschaftlichste Ansatz ist aber meist, geeignete Maßnahmen direkt in der Nähe der jeweiligen Störquelle am Chip zu platzieren. Da Maßnahmen am Chip in die Schaltung mit integriert werden, sind Simulationsmodelle enorm hilfreich, um potentielle Störquellen bereits in frühen Phasen des Entwicklungsprozesses zu identifizieren. Frühe Simulationen ermöglichen darüber hinaus Studien zur Vorhersage der Auswirkungen verschiedener Designvarianten. Solche Modelle bestehen typischerweise aus passiven Elementen, die die Eigenschaften der Verbindungsleitungen am Chip repräsentieren, und aus aktiven Störquellen zur Modellierung des zeitlichen Verlaufs der Stromaufnahme der jeweiligen Komponenten. Schwerpunkt dieser Arbeit sind neuartige Ansätze zur Bestimmung des dynamischen Verhaltens von digitalen Schaltungen und die Entwicklung von Modellen der Störungen, die für frühzeitige Designstudien herangezogen werden können. Die vorgestellten Methoden basieren auf Beschreibungen der Schaltungen auf Gatterebene, welche typischerweise bereits nach der Schaltungssynthese verfügbar sind. Dies ermöglicht Designanalysen bevor das tatsächliche Layout der Verbindungsleitungen der Zellen sowie des Versorgungssystems am Chip implementiert werden. Eine Bibliothek, die Parameter zur Beschreibung des dynamischen Verhaltens der diversen Zellen in Form von Charakteristiken der Schaltstromverläufe und Informationen über das Timing der Signalübergänge zur Verfügung stellt, wird dazu charakterisiert. Für eine effiziente Bestimmung der Schaltaktivitäten der jeweiligen Zellen wird ein kombinierter Ansatz aus einer pattern-basierten Simulation und einer Interpretation von zufälligen Aktivitäten vorgestellt. Da Gatternetzlisten keine Informationen über Verbindungsleitungen am Chip liefern, werden darüber hinaus Methoden zur Approximation der parasitären Effekte von Verbindungsleitungen behandelt. Contents 1. Introduction 1.1. Motivation . . . . . . . . . 1.2. State of the Art . . . . . . 1.3. Goals of this Work . . . . 1.4. Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Digital Integrated Circuit Basics 2.1. MOS Transistor . . . . . . . . . . . . . 2.1.1. Structure and Operation . . . . 2.1.2. MOS Transistor Capacitances . 2.2. CMOS Devices . . . . . . . . . . . . . 2.2.1. Static Behavior . . . . . . . . . 2.2.2. Transient Characteristics . . . . 2.2.3. Power and Energy Consumption 2.3. Cell-Based Design Methodology . . . . 2.3.1. Standard Cells . . . . . . . . . 2.3.2. Macro Cells . . . . . . . . . . . 2.4. Deep Submicron Interconnects . . . . . 2.4.1. Interconnect Parameters . . . . 2.4.2. Wire Models . . . . . . . . . . . 2.5. Power Distribution Networks . . . . . . 2.5.1. Voltage Drops . . . . . . . . . . 2.5.2. Decoupling Capacitances . . . . 3. Synchronous Sequential Digital Systems 3.1. General Principles . . . . . . . . . . . 3.2. Digital System Clocking . . . . . . . 3.2.1. System Clock Generation . . . 3.2.2. Clock Distribution . . . . . . 3.2.3. Clock Gating . . . . . . . . . 3.3. Combinational Logic Cells . . . . . . 3.4. Clocked Storage Devices . . . . . . . 3.4.1. Latches . . . . . . . . . . . . 3.4.2. Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 4 4 . . . . . . . . . . . . . . . . 7 7 8 11 12 14 15 17 17 18 19 20 20 22 24 25 26 . . . . . . . . . 27 27 29 29 30 31 33 34 35 35 x Contents 3.4.3. Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Standard Cell Characterization 4.1. Outline of the Methodology . . . . . . 4.2. Dynamic Cell Behavior Considerations 4.3. Equivalent Inverter . . . . . . . . . . . 4.4. Single Cell Simulations . . . . . . . . . 4.4.1. Simulation Environment . . . . 4.4.2. Parameter Variation . . . . . . 4.4.3. Simulation Stimuli . . . . . . . 4.5. Characteristic Parameter Extraction . 4.5.1. Timing Characteristics . . . . . 4.5.2. Current Profiles . . . . . . . . . 4.6. Macro Cell Characterization . . . . . . 36 . . . . . . . . . . . 41 41 42 45 48 49 50 51 53 53 54 56 . . . . . . . . . . . . . . 59 59 60 61 62 64 64 65 68 70 71 74 76 77 81 6. Parasitic Effected Current Modeling 6.1. General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Cell Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Power Distribution Networks . . . . . . . . . . . . . . . . . . . . . . 87 87 90 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Netlist Based Current Modeling 5.1. Current Profile Calculation . . . . . . . . . . . . . . 5.1.1. Cell Environment Identification . . . . . . . 5.1.2. Single Event Characteristics Determination . 5.1.3. Current Profile Composition . . . . . . . . . 5.2. Circuit Simulation Methods . . . . . . . . . . . . . 5.2.1. Pattern Based Simulation . . . . . . . . . . 5.2.2. Random Activity Interpretation . . . . . . . 5.3. Modeling of Complex Modules . . . . . . . . . . . . 5.3.1. Module Partitioning . . . . . . . . . . . . . 5.3.2. Clock Subsystem . . . . . . . . . . . . . . . 5.3.3. Clocked Storage Elements . . . . . . . . . . 5.3.4. Combinational Logic . . . . . . . . . . . . . 5.3.5. Profile Composition . . . . . . . . . . . . . . 5.3.6. Multiple Clock Domains . . . . . . . . . . . 7. Implementation and Verification 7.1. Library of Characterized Cells . . . . . . 7.1.1. Static Properties . . . . . . . . . 7.1.2. Dynamic Characteristics . . . . . 7.2. Circuit Description Interpretation . . . . 7.2.1. Verilog Gate-Level Netlist Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 . 97 . 97 . 99 . 101 . 101 Contents 7.3. 7.4. 7.5. 7.6. 7.2.2. Path Categorization . . . . . . . Pattern Based Simulation . . . . . . . . 7.3.1. Software Implementation . . . . . 7.3.2. Simulation Results . . . . . . . . Complex Module Modeling . . . . . . . . 7.4.1. Partitions Initialization . . . . . . 7.4.2. Current Profile Determination . . 7.4.3. Simulation Results . . . . . . . . Profile Post-Processing . . . . . . . . . . Chip-Level Current Consumption Models xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 104 104 105 107 107 108 108 110 113 8. Conclusion and Outlook 115 8.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A. Fourier Transform Characteristics A.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2. Fourier Transform Properties . . . . . . . . . . . . . . . . . . . . . A.3. Transform Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 117 118 119 1. Introduction Today’s electronic equipment has to meet demanding requirements in terms of complexity and reliability, but also in terms of electromagnetic compatibility, which is especially tough in the automotive and avionic branches [1]. Electronic control units manage or support many critical functions such as steering and braking, but are also responsible for the passenger’s comfort and entertainment. It is fundamentally important to ensure a reliable functionality, but each component also must not affect the operation of other devices. Systems are therefore usually verified by measurements, and certified subsequently. As the costs of fixing defects or compatibility issues significantly increase with the progress in the development and fabrication process, simulations of the system behavior are fundamentally important tasks within the design flow of electronic devices. Hence, simulation models of the respective components are required to identify potential issues, and are consequently most effective in early design phases. 1.1. Motivation Besides switching power transistors, complex digital integrated circuits have been identified as one of the main contributors to system-level interferences. Since most of the digital devices are typically triggered by a system clock signal, the internal functions of such devices are almost simultaneously active, and possibly cause significant current peaks at equidistant points in time. As the power supply lines of a system always have a certain impedance, particularly high current peaks cause temporary drops of the supply voltage, and may consequently lead to system malfunctions. The dimensions of the on-chip power supply network, but also the design of the printed circuit board, which is for instance used to interconnect the components of a control unit, is of great importance. Given that voltage drops may possibly affect the power integrity of other components on the board, but also cause electromagnetic compatibility (EMC) issues, adequate measures to stabilize the power supply system have to be established. Measures to reduce the noise on the supply lines can be implemented on boardlevel, but can also be integrated into the particular components [2]. With regard to avoid redesign cycles, it is beneficial to know the characteristics of the noise as early as possible, but at least before manufacturing. Hence, accurate simulation models are needed to design economic and efficient measures to ensure a reliable 2 1. Introduction Passive Active Elements Elements Figure 1.1.: Basic structure of an integrated circuit model. operation of the particular devices and consequently the entire system. Integrated circuit emission models are most commonly used to model the dynamic behavior of semiconductor devices [3], which basically consist of the components shown in Figure 1.1. While the passive model elements are a kind of impedance model of the on-chip wiring and the package characteristics, the active elements are current sources, representing the current consumption caused by the internal activity of the respective circuit. Complex devices possibly require different supply voltages, where each of them may in addition need to be applied at several pins. Emission models are in such cases also more complex, and typically consist of several active and passive elements to consider the behavior at the respective pins, as well as potential coupling effects. Such models are usually provided by the semiconductor manufacturer, and enable component-level simulations to determine eventually required noise reducing measures at the printed circuit board [4]. But emission models can also be used to verify the conformance to the requirements at chip-level as a sign-off criteria before manufacturing. The passive model element parameters are usually extracted from the layout data, while the current sources are determined by simulations of the respective on-chip functional blocks. Given that measures to eliminate the generated noise are most efficient when they are located close to the source, it is beneficial to separately analyze the behavior of the particular on-chip modules such as the processor core, and the peripheral controller units of a microcontroller. As the implementation of properly matched measures becomes more and more expensive with the phases in the design flow, it is important to have methods available that are applicable for a prediction of the generated noise characteristics in early design phases. 1.2. State of the Art The complexity of integrated circuits significantly increased over the recent years. As the dynamic behavior and interference with other system components became 1.2. State of the Art 3 an important issue, standards for the electromagnetic compatibility (EMC) modeling have been introduced [5]. Particularly important in this context are the I/O Buffer Information Specification (IBIS) [6] and the Integrated Circuit Electrical Model (ICEM) [7] standard. IBIS models are generally accepted for signal integrity analysis, but ICEMs are more comprehensive in terms of the representation of the generated noise and its propagation over the power supply lines. ICEMs are primarily used to provide behavioral models of ICs for efficient simulations at board- or system-level, but are also intended to be quite feasible for sign-off verifications of an IC before manufacturing. Depending on the intended use, a model can be more or less complex. While the models provided to a customer are typically simplified, the models for chip-level simulations usually include several design-specific details. As mentioned before, the implementation of noise reducing measures is most efficient at early design phases. Hence, there are primarily two approaches to model the chip behavior: Layout-based models are generated by extracting the respective parameters for the passive model elements from the layout data. There are already commercial tools, such as for instance XcitePI [8], which are capable to reasonably predict the behavior of a chip. The parameters of such models are usually aligned to measurement results of the final product, prior to the generation of adapted ICEMs for system-level simulations. But these models nevertheless can be considered as reasonably accurate for sign-off verification simulations before manufacturing. Design studies are intended to predict the dynamic behavior of a device preferably early in the design process. Due to the late availability of the chip layout, design studies are often based on empirical values. Especially the passive ICEM elements consequently represent the expected wiring properties of the respectively chosen design option. Tools that are supporting such approaches are primarily subject of research works [9, 10] or proprietary solutions of IC manufacturers, such as EXPO [11] developed by Infineon Technologies. The passive model parameters can therefore be reasonably extracted or predicted from the available design information. In case of the active elements for the generated noise, the above mentioned tools for early design studies are based on different approaches. The software introduced in [9] requires for instance a set of parameters describing the circuit properties in terms of the gate count, the number of supply voltage pins, the clock frequency, the gate activity percentage, and the chip size. The noise models are therefore determined by an estimation of the current waveform based on several assumptions, and consequently not directly related to the actual circuit design data. Other approaches are in turn based on a replacement of the cell instances with parameterized behavioral models in conjunction with a 4 1. Introduction transformation of the circuit structure [12]. On the other hand, the approach that is introduced in [10] basically relies on functional simulations of the circuit and extracts the particular switching activities. The noise sources are in this case subsequently generated on the basis of such switching event lists and a library providing the respective switching event noise currents. 1.3. Goals of this Work At the time this thesis was started, there already existed some approaches to model the on-chip noise sources. Each of them has its benefits, but also drawbacks. Estimations of the current waveforms based on several parameters, but not related to the actual design, allow early design studies, but with numerous uncertainties. On the other hand, models that are based on functional simulations promise accurate results, but require a detailed knowledge of the circuit functionality and a probably high simulation effort. In this case, there are probably also limitations in terms of the flexibility to compare different design options and to predict the effects of potential noise reducing measures. Hence, the primary goal of this work is to find methods that are capable to efficiently model the switching currents of digital integrated circuits, which are based on the actual circuit design, but with the ability to predict the effects of possibly advantageous design variations. The model generation is also desired to be highly efficient in terms of the computational effort, and to be able to investigate several design options within a short time. A kind of high-level model generation has therefore been taken into account, which is consequently based on gate-level circuit descriptions. As digital gate-level standard cell libraries typically do not provide any analog properties, i.e. the switching current waveform characteristics and timing models, a characterization of such a library has to be performed additionally. At the time this thesis was started, integrated circuits for automotive applications were commonly fabricated in 130 nm technologies. The characterization methods therefore need to be feasible for this technology, but should also be applicable for 90 nm structures. Furthermore, since cell interconnect wires have a significant effect on the transient behavior of the particular on-chip devices, an approach to consider these parasitic effects before the chip layout is available, is to be introduced as well. 1.4. Organization of the Thesis Chapter 2 briefly recapitulates the basics of digital integrated circuits. The structures and characteristics of the most important on-chip devices, as well as the parasitic effects of interconnect wires and power distribution networks, are introduced. 1.4. Organization of the Thesis 5 With regard to determine the switching activities of digital circuits, Chapter 3 discusses the most important characteristics of digital systems, such as the basic principles of synchronous sequential designs, the system clock generation methods, the structures of on-chip clock signal distribution networks, and clock gating considerations. As it is important for an efficient standard cell library characterization, the most important properties of clocked storage elements and combinational logic cells are introduced as well. The methodology and considerations for the characterization of the standard cell library are discussed in Chapter 4. Methods based on the pre-characterized library which allow to efficiently model the current consumption of digital modules, and approaches for an interpretation of randomly assigned activities, are introduced in Chapter 5. Given that at least the effects of the cell interconnect wires need to be considered within the current source models, Chapter 6 discusses a method to post-process the profiles determined for ideal conditions (i.e. a constant power supply voltage and lossless interconnect wires) to approximate the respective parasitic effects. Chapter 7 presents the considerations for the implemented tool, which is capable to generate current consumption models based on the methods introduced in the chapters before. Furthermore, a verification of the results for circuits with different complexities and characteristics is provided. A conclusion and outlook is finally given in Chapter 8. 2. Digital Integrated Circuit Basics Most of today’s digital integrated circuits are realized in Metal Oxide Semiconductor (MOS) technologies. Logic functions are predominantly provided by standard cells, which are implemented as Complementary MOS (CMOS) devices. This chapter discusses the structure and behavior of primary devices as well as the effects of interconnect wires and power distribution networks. 2.1. MOS Transistor The basic device of integrated circuits is the MOS transistor. As it performs essentially as a switch, it is predestinated to implement digital functions. In CMOS technology, two types of transistors are present. While the n-type (or NMOS) transistor needs a positive control voltage to become conductive, the p-type (or PMOS) transistor is switched on when low voltages are applied to the input. Figure 2.1 shows common schematic symbols for both types. The conductivity between drain (D) and source (S) is controlled by the applied voltage at the gate (G) terminal. As the bulk (B) terminal is usually tied to the source potential in digital VLSI designs, MOS transistors are therefore often shown as three-terminal devices (Figure 2.1(b) and (d)). D G G G G G G G G D D D B G G B B S S (a) NMOS transistor SD device SD as a 4-terminal G G D B B B S D D D D S S (b) NMOS transistor SD SD device as a 3-terminal B G G B S G G S S (c) PMOS transistor as a 4-terminal device D D D S S S S (d) PMOS transistor as a 3-terminal device Figure 2.1.: Common symbols for MOS transistors. p‐substrate 8 2. Digital Integrated Circuit Basics gate source oxide drain tox n+ n‐channel L n+ p‐substrate bulk Figure 2.2.: MOS transistor structure. 2.1.1. Structure and Operation The structure of NMOS transistors is schematically shown in Figure 2.2. It consists of two heavily doped n+ regions, embedded in a lightly doped p-type substrate. The gate electrode is situated between them on the substrate, separated by a silicondioxide layer. PMOS transistors have a similar structure. The drain and source regions are in this case heavily doped p+ regions, embedded in an n-type substrate. Given that transistors are typically symmetric, the source and drain terminals are in this case interchangeable. The final distinction between NMOS and PMOS transistors is consequently given by the applied signals and voltages. As the charge carriers move from source to drain, NMOS and PMOS transistors have a different polarity. While electrons are the carriers in NMOS, holes carry the current in PMOS transistors. Hence, the PMOS source is the terminal connected towards the positive supply voltage, while the more-positive node at the NMOS device is defined as the drain. Important dimensions for the following discussion of the operating modes are the channel length L, the channel width W , and the oxide thickness tox . Due to the similar operating principles of n-type and p-type transistors, the following discussion is concentrated on NMOS devices. All principles and formulas are also valid for PMOS devices by inverting the signs of voltages and currents. If all transistor terminals are tied to ground, the path from drain to source shows two pn+ junction diodes connected back to back, with the substrate as a common p-region. Under this condition, both junctions can be considered as ”off”, which results in an extremely high resistance between drain and source. The gate and substrate form the plates of a capacitor with the gate oxide as the dielectric. Applying a positive voltage to the gate causes an accumulation of negative charges under the gate. If the gate voltage is sufficiently high, the accumulation of electrons under the gate leads to an inversion of the p-type substrate to an n-type channel in this gate 2.1. MOS Transistor source drain n+ n+ 9 p‐substrate gate gate source drain n+ source gate source n+ p‐substrate n+ n+ gate (a) subthreshold: VGS < VT source drain n+ source p‐substrate n+ p‐substrate p‐substrate gate source gate drain n+ drain n+ n+ p‐substrate p‐substrate (c) drain saturation: VDS ≥ VGS − VT source n+ n+ (b) resistive: VDS < VGS − VT n+ gate source drain n+ drain n+ gate drain n+ n+ Figure 2.3.: MOS transistor operating regions. p‐substrate p‐substrate gate region between drain and source. Provided that a positive voltage is applied at the source drain drain terminal, this conductingn+channel allows a current flow from drain to source. n+ The required gate-voltage to form the conducting channel is termed threshold voltage (VT ). The p‐substrate analytical derivation of VT , as well as the formulas describing the operating behavior, is extensively done in the respective literature [13, 14, 15, 16]. For the discussion of the transistor operation, some parameters are defined. One of the most important parameters for the operating characteristics is the gate-oxide capacitance per unit Cox , defined as Cox = εox , tox (2.1) where εox is the permittivity of the oxide and tox is the oxide thickness. The process transconductance k 0 depends on the carrier mobility µn and is given by: k 0 = µn Cox = µn εox . tox (2.2) The operation of MOS transistors is typically subdivided into three regions, which are discussed in the following sections1 and shown in Figure 2.3. Subthreshold Region If VGS < VT , no conducting channel is formed (Figure 2.3(a)). The transistor can be considered as switched off in this region, and the drain-source current IDS is therefore very small, but not zero. Compared to the current flow through a conductive 1 Even though the discussed devices and circuits are designed in a deep submicron technology, this chapter is intended to introduce the basic operating behavior and is therefore founded on the characteristics of long-channel devices. 10 2. Digital Integrated Circuit Basics channel, it is small enough to be neglected in many cases, but becomes more and more important for low-power applications in deep submicron technologies. Resistive Region Applying a sufficiently high voltage between gate and source (VGS > VT ) forms the previously mentioned conductive channel (Figure 2.3(b)). A small voltage difference between drain and source (VDS ) causes a current ID to flow from drain to source. Due to the applied voltage VDS , the gate-to-channel voltage V (x) – at a point x along the channel – decreases from source to drain. The transistor is in the resistive region, when the assumption VGS − V (x) > VT is given all along the channel. The voltage-current relation in this case is given by: IDS = k 0W L (VGS − VT ) VDS 2 VDS − . 2 (2.3a) Substituting the device transconductance parameter (also termed gain factor) k = k 0 (W/L) results in: IDS = k (VGS − VT ) VDS V2 − DS 2 (2.3b) Given that for small values of VDS the quadratic factor in Equation 2.3 can be ignored, the voltage-current dependencies are linear. This leads to the term resistive or linear region. Saturation Region Increasing the drain-source voltage comes along with a decrease of the channel charge near the drain. If the condition VDS ≥ VGS − VT is met, the induced charge at the drain is zero and the conducting channel is pinched off (Figure 2.3(c)). In this case, the transistor is in the saturation region, and the voltage over the effective channel (from the source to the pinch-off point) remains fixed at VGS − VT . Therefore, also the current IDS is saturated, but due to the channel-length modulation (effective channel length depends on the drain-source voltage), IDS depends nonetheless on VDS . This effect is considered by the channel length modulation parameter λ, and the drain-source current in this region is given by: IDS = k (VGS − VT )2 (1 + λVDS ) . 2 (2.4) 2.1. MOS Transistor 11 gate CGSO source CGC n+ CGDO drain n+ Cj p‐substrate Cj Figure 2.4.: MOS transistor capacitances. 2.1.2. MOS Transistor Capacitances The performance of digital circuits is primarily given by the time it takes to charge/discharge the intrinsic capacitances of the MOS transistors. Besides the gate-capacitance, a transistor has several additional parasitic capacitances given by the structure. Most of them are voltage-dependent and additionally nonlinear. Figure 2.4 gives an overview of the primary capacitances discussed in the following sections. Overlap Capacitance The gate electrode and the gate oxide have exactly the dimensions of the channel in an ideal case. Due to the lateral diffusion, as the oxide overlaps the drain and source regions by xd , an overlap capacitance is formed. It is linear and given by: CGSO = CGDO = Cox xd W = Co W (2.5) where Cox is the capacitance per unit area (Eq. 2.1), and W is the channel width. Channel Capacitance The most significant parasitic element of a MOS transistor is the nonlinear and voltage-dependent gate-to-channel capacitance CGC . It is divided into three components: CGCS (gate-to-source), CGCD (gate-to-drain), and CGCB (gate-to bulk). As the distribution of the total capacitance depends on the operating region, a separate consideration for each of these regions is done. cut-off: If the gate-voltage is lower than the threshold, no channel exists. The total capacitance CGC is between gate and bulk. resistive: In this region, where the channel is present over the full distance between source and drain, CGC is equally distributed between CGCS and CGCD . As the channel shields the bulk from the gate, CGCB is zero in this case. 12 2. Digital Integrated Circuit Basics saturation: The pinched-off channel in this region leads to a negligible CGCB and CGCD , and most of the capacitance is therefore between gate and source. The value of CGCS in the saturation region is 23 Cox W L. An overview of the distribution of CGC to its components is shown in Table 2.1. Given that a transition from one region to another is a continuous process, an entirely correct expression of the capacitance-distribution is more comprehensive. But the most important fact for the following chapters is, that all capacitances are proportional to W L. Table 2.1.: Channel capacitance in the different operating regions. region CGCB cut-off Cox W L resistive 0 saturation 0 CGCS CGCD CGC 0 0 1 C WL 2 ox 2 C WL 3 ox 1 C WL 2 ox Cox W L Cox W L 2 C WL 3 ox 0 Junction Capacitances Each of the reverse-biased pn-junctions at the source and drain regions form a junction capacitance Cj , which can be divided into the two following components: bottom plate: The capacitance contributed by the area at the bottom of the junction is given by Cbottom = Cj0 W Ls , where Cj0 is the junction capacitance per unit area and Ls is the length of the junction-sidewall. 0 sidewalls: The sidewall-capacitance is given by Csw = Cjsw xj (2Ls + W ), where the junction-to-channel capacitance is neglected, and the width W is therefore considered only once. Given that the sidewall-height xj is a technology pa0 rameter, it can be combined with Cjsw to the capacitance per unit perimeter 0 Cjsw = Cjsw xj . As a result, the expression respecting both contributions is: Cj = Cbottom + Csw = Cj0 Ls W + Cjsw (2Ls + W ) (2.6) 2.2. CMOS Devices Complementary MOS devices typically consist of both PMOS and NMOS transistors. The most basic CMOS device is the inverter shown in Figure 2.5, which consists of one transistor of each type. The input voltage Vin is applied at both 2.2. CMOS Devices 13 1.5 VDD 1.25 Vout [V] 1 Vin Vout 0.75 0.5 0.25 0 VSS (a) internal circuit 0 0.25 0.5 0.75 Vin [V] 1 1.25 1.5 (b) voltage-transfer characteristic Figure 2.5.: CMOS inverter schematic and voltage-transfer characteristic. gates, and the output voltage Vout is present at the interconnected drain terminals. Low input voltages force the PMOS transistor to become conductive, while the NMOS transistor is high-resistive in this case. The output voltage Vout is therefore pulled up towards VDD . On the other hand, high input voltages force the NMOS transistor to be conductive, while the PMOS transistor is high-resistive, and the output voltage is tied towards VSS . As abrupt switching from one state to the other is realistically impossible, a continuous voltage transfer characteristic of an exemplary device is plotted in Figure 2.5(b). Due to the fact that the output voltage is either pulled towards VDD or VSS , the voltage swing of Vin and Vout is virtually equal to the supply voltage. The operation of an inverter shows the general principle of logic gates; one transistor type is used to pull a node down towards VSS , and the other one towards VDD . Often used terms in this context are pull-up network (PUN) and pull-down network (PDN). Figure 2.6 shows the transistor-level schematic of the most basic combinational CMOS devices with at least two inputs: the NAND and NOR gates. It is shown that the internal circuit is directly equivalent to the gate-function. But due to the inverting characteristic, as a high input voltage primarily forces an NMOS transistor to pull a node down to a lower voltage level, CMOS devices are typically also complementary regarding its internal transistor-level circuits. To exclude states where both PUN and PDN are actively pulling a node towards opposite voltage levels, a general rule is that parallel PUN transistors are connected in series at the PDN and vice versa2 . 2 This rule is valid for the basic combinational functions, but there are gates implementing more complex functions. Due to optimizations, different structures are possible in this case. 14 2. Digital Integrated Circuit Basics VDD VDD A PUN PUN Z B A Z PDN PDN B VSS VSS (a) NAND (b) NOR Figure 2.6.: Schematic of CMOS NAND (a) and NOR (b) gates with two inputs. The output Z of the shown NAND gate is for instance low when both inputs (A and B) are high, since in this case all NMOS transistors are conductive and all PMOS transistors are high resistive. All other input states lead to the output state high, since at least one low input voltage causes a PMOS transistor to become conductive and the output is pulled towards VDD . At least one NMOS transistor is simultaneously switched off, leading to a high resistance between VZ and VSS . A NOR gate is the direct opposite in this aspect, consisting of a p-type transistor for each input in series and as many parallel n-type ones. 2.2.1. Static Behavior As mentioned before, all input and output signal voltages are virtually equal to VDD or VSS in the steady states. The potential difference, due to the channelresistance of conductive transistors, can be neglected in this case. Consequently, all transistors are either switched ”on” or ”off”. The current consumption in this state is very small, but not zero. It can be neglected in many cases, but becomes more and more important for deep submicron technologies, as technology scaling leads to significantly increasing leakage currents [17]. This static current flow is a consequence of the subthreshold conductance and the gate leakage. Figure 2.7 shows for instance three consecutive inverters, where the static behavior of the embedded one (Pcell , Ncell ) is analyzed. At a first glance, a high-resistive transistor allows, due to the subthreshold conductance, a current flow 2.2. CMOS Devices 15 VDD VDD Pdrv VDD Pcell Vin Pout Vout Iin Ndrv VSS Iout Ncell VSS Nout VSS Figure 2.7.: Transistor-level schematic of three consecutive inverters. at the direct path from VDD to VSS . Depending on the input state, the amount of current is given by the off-resistance of the corresponding PMOS or NMOS transistor. On the other hand, the gate leakage causes a current flow from VDD over the PMOS of one cell and the NMOS transistor of a thereon connected cell. Provided that a low voltage is applied at the input of the driving inverter, Vin is pulled to a high potential, and the transistors Pdrv and Ncell are conductive. The oxide leakage of Ncell permits a current flow in the direction of Iin in this case. As the output voltage Vout is low, Pout is also switched on. This causes an oxide leakage current in the opposite direction of Iout . Since gate oxide resistances are negligible compared to the on-resistance of a driving transistor, it can be considered as independent from the driving cell strength. It is consequently given by the gate characteristics of the transistors connected to the cell input. As a result the static current consumption of a circuit cell can be determined by the analysis of the particular cell characteristics. The effects of the actually connected cell characteristics in a system are insignificant. 2.2.2. Transient Characteristics In addition to the static current flow, the overall current consumption of CMOS devices is mainly given by the dynamic switching currents. This dynamic part is caused by any circuit activity and primarily consists of the required amount of current for charging/discharging the parasitic capacitances, as well as a certain cross current. Given that during a transition period there exists a short time interval where the complementary transistors are weak resistive, this type of current can flow through a device on the direct path from VDD to VSS . But the most important 16 2. Digital Integrated Circuit Basics part is the current caused by capacitive loads. Due to the limited carrier velocity, it takes some time to charge/discharge the parasitic elements, which limits the signal slew rates. This consequently leads to signal propagation delays from the inputs to the outputs of cells, and results in a limited system performance, given by the technology characteristics. Vin VH 90% 50% VL 10% tr tf tpLH t tpHL Vout VH VL tHL tLH t Figure 2.8.: Timing parameter definitions. Common definitions of the timing parameters are illustrated in Figure 2.8. It shows an input and output voltage waveform at a gate, where a rising edge at the input causes a falling edge at the output. As mentioned before, the steady state voltages are not exactly the supply voltages, and therefore labeled VL and VH . The signal rise and fall times are typically defined between the 10% and the 90% points of the total voltage swing. As the values are different for Vin and Vout , they are termed tr and tf for input transitions, and tLH (low-high) and tHL (high-low) at the output. Signal rise/fall times largely depend on the strength of the driving gate and the load presented to it. The output response times, or propagation delays, for low-high (tpLH ) and high-low (tpHL ) input events are defined between the 50% transition points of the input and output voltage swing. 2.3. Cell-Based Design Methodology 17 Another important characteristic parameter of gates is the propagation delay tp . Because the response times for rising and falling input events are possibly different, it is defined as the average of them: tp = tpLH + tpHL 2 (2.7) 2.2.3. Power and Energy Consumption Important properties of a system, such as the power supply capacity, the battery lifetime, the chip internal power distribution line dimensions, packaging and cooling requirements depend on the power and energy consumption. The peak power Ppeak is for instance important for the supply line dimensions, while the average power dissipation Pav is decisive for cooling requirements or the battery lifetime of mobile devices. The definitions are Ppeak = ipeak Vpeak = max[p(t)] Pav 1 = T Z 0 T Vsupply p(t)dt = T Z T isupply (t)dt (2.8) 0 where p(t) is the instantaneous power, isupply is the current from the supply with the voltage Vsupply over the interval t ∈ [0, T ], and ipeak is the maximum of isupply . The components of this supply current are the previously mentioned static and dynamic currents. While the static one is permanently present, even when the system is inactive, the dynamic component is proportional to the switching activity and the system clock frequency. 2.3. Cell-Based Design Methodology There are several different methods to design a circuit, but the cell-based design methodology is established for the development of the majority of today’s digital integrated circuits [18]. Given that this method enables a high grade of automation and tool support, short design cycles and moderate design costs can be achieved. A cell-based design is basically implemented by instantiating building blocks (cells), where each of them provides a particular functionality[19]. These cells are typically connected by wires in several layers, where the number of them depends on the manufacturing process. This design methodology is also called semi-custom, as the number and type of available building blocks is given by a library, and only the composition of the system, by instantiating these given cells, is customary done. Systems such as microcontrollers or other complex systems typically consist of standard cells and macro cells, which are discussed in the following sections. 18 2. Digital Integrated Circuit Basics (a) inverter (b) strong inverter (c) 2-input nand Figure 2.9.: Internal structure of exemplary standard cells. 2.3.1. Standard Cells As the basic devices of cell-based designs, standard cells represent implementations of the elementary functionalities needed for assembling a digital circuit. This can be a basic boolean logic function (e.g. AND, OR, XOR, inverter), an arithmetic function (e.g. half- or full-adder), a storage element (e.g. latch, flip-flop), but possibly an even more complex encoder, decoder, comparator, or multiplexer as well. Standard cells are typically organized in a library, where each cell type (implemented functionality) is usually provided with different fan-in and fan-out characteristics. This is important for design optimization reasons, since gates with larger transistor dimensions enhance the system performance by higher signal slew rates and therefore shorter propagation delays. But on the other hand, as chip area is a significant cost factor, it is under certain conditions beneficial to instantiate cells with smaller driver strengths and accept longer propagation delays for noncritical signals. In addition to a possibly smaller chip area, this reduces also the power and energy consumption. Such a limited library of cells enables the design of a circuit using a high-level hardware description language. A synthesis tool uses the descriptions in the celllibrary to transform this high-level circuit-description into a technology-dependent netlist [20]. This netlist contains basically a list of the instantiated components and the information concerning the interconnections of the particular cell ports. Based on this netlist, placing the cells and routing the wires on the chip can be also generated automatically by an appropriate tool. Figure 2.9 shows the internal 2.3. Cell-Based Design Methodology 19 transistor-level structure of two inverter cells with different driver strengths, and a NAND gate with two inputs. Important is that the cells in one library typically have a constant size in at least one dimension, which allows to line them up in rows on the chip. This reference size is usually the cell height, while the width varies according to the complexity of the cell function and the driver strength. Given that VDD and VSS are located at the top and bottom of the cell, this allows continuous power supply rails on the chip, while the transistors are located side by side and relatively short interconnections are possible. The cell layouts also show that the port connectors are typically located centrally, since each port is usually connected to both NMOS and PMOS transistors. 2.3.2. Macro Cells The complexity of integrated circuits is permanently increasing. To preferably reduce the development effort and time of new systems, it becomes more and more important to reuse reasonably large blocks of existing systems. Such reusable components are called macro cells, which usually provide a significantly increased complexity and functionality than the cells in a typical standard cell library. Common examples for macro cells are multipliers, memories, data paths, and even complete microprocessor (µP) cores or digital signal processor (DSP) entities. As macro cells have a well-defined functionality, the internal structure is typically similar at each cell instance. In this case, it is often beneficial to additionally specify the physical design in terms of transistor locations and wiring. The advantage of such a hard macro is the possibility of an economic optimization, as it has to be developed only once, while the benefits are available at each instance. Important properties are in this context the predictable performance and power consumption. Possible disadvantages are the predefined dimensions on the chip and the fact, that there are no options for customizations. One special case are cells with an almost regular internal structure, such as memory modules. These types of macros can be possibly generated with slightly different properties from parameterized models. Using a specialized module compiler enables the adaptation of the internal structure and layout of regularly structured cells to meet the particular requirements of different applications. Macros without any physical design implementations are called soft macro cells, where only the functional description is defined. This can be a structural description in the form of a gate-level standard cell netlist, but also a high-level description in a hardware description language, such as VHDL or Verilog [21]. High-level descriptions of soft macros are typically exceedingly parameterized to fulfill the requirements of various applications. This kind of soft macro is also called an intellectual property module and often provided by third party vendors. 20 2. Digital Integrated Circuit Basics 2.4. Deep Submicron Interconnects Technology shrinking significantly reduces the structures of a chip. This allows more and more complex circuits at a given chip area, but also reduces the crosssections of interconnect wires. On the other hand, the wire lengths increase with the system complexity of a chip. As a result, the characteristics of cell interconnect wires cause a significant effect on the performance of a system in deep submicron technologies. The most important parameters, as well as commonly used models for cell interconnects, are introduced in the following sections. A detailed discussion of interconnect wire effects can also be found in [15, 19]. 2.4.1. Interconnect Parameters Wire Resistance Shrinking the dimensions of the on-chip structures reduces at least the widths of the interconnect wires. As the wire lengths are probably increased due to the possibly more complex circuits, the wire resistance became an important parameter. This resistance R is given by, and often also derived to ρL L ρL = = Rsq (2.9) A TW W where ρ is the resistivity of the material in Ωcm, L is the length, and A is the area of the wire cross-section. As ρ is given by the material and T is almost constant for a given technology, the sheet resistance Rsq = Tρ is an important parameter. The L , also wire resistance is therefore given by a multiplication of Rsq with the ratio W referred to as the number of squares of a wire. R= Coupling Capacitances As also mentioned in the previous sections, capacitances are one of the primary parameters effecting the performance of a system. Compared to the parasitic MOS transistor capacitances, the effects caused by cell interconnect wires are considerably increased due to submicron effects. Due to the potentially complex three-dimensional wire structure of state-of-theart integrated circuits, an accurate modeling of the wire capacitances is a nontrivial task. As the spacing between wires has been decreased, coupling effects between the wires have become significant. The most important components of the total wire capacitance are shown in Figure 2.10. As a consequence of the typically overlapping wires on different layers, this area capacitance CA , located between different layers, is given by WL CA = εox (2.10) H 2.4. Deep Submicron Interconnects F 21 A F L L F F A Figure 2.10.: Components of the total wire capacitance. The capacitance between lines in the same layer are called the lateral capacitance CL and given by TL (2.11) CL = εox S Due to the reduced wire spacing S in deep submicron technologies, this component became the main coupling capacitance, and is therefore significantly responsible for delay and noise issues. Minor significant, but possibly also responsible for coupling effects are the fringing capacitances CF . Actually present at any edges and surfaces, only the capacitances to the neighbor layers are shown in the figure, as CL typically dominates when the spacing between the wires is sufficiently small. CF is approximately given by TL (2.12a) CF a = εox ln 1 + H or alternatively for widely spaced wires located between neighbor wires at one layer: WL CF l = εox ln 1 + (2.12b) S The total wire capacitance C can finally be summed up to C = 2CA + 2CL + 2CF a + 2CF l (2.13) where some of the components may be neglected depending on the actual wire spacing. 22 2. Digital Integrated Circuit Basics Interconnect Inductance In many cases negligible, but as a consequence of low-resistive interconnect materials and increased switching frequencies, the interconnect inductance starts to play a role. As increased switching frequencies lead to higher variations of the current within short time periods, inductances possibly generate a considerable voltage drop di ∆V = L (2.14) dt The wire inductance can be determined directly from its geometry and environment, but also by the relation of the capacitance c and inductance l (per unit length) of a wire by the expression cl = εµ (2.15) where ε is the permittivity and µ is the permeability of the dielectric. 2.4.2. Wire Models The characteristics of interconnect wires can be considered as ideal in early design phases where only the functional properties of a circuit are of interest. Also very short wires at transistor level, or side by side located cells, usually cause no relevant effects on the system behavior. Interconnect wires over relevant distances possibly cause significant effects on the switching performance of the driving transistors. Wire models are therefore necessary for a reasonable circuit analysis, but also during the design phase to determine appropriate measures for optimization and reliability issues. Accurate wire models, even including all the minor effects, would result in too complex models. Therefore, different models with a reasonable complexity are used to approximate the real interconnect behavior. Lumped Models Even though the previously mentioned parasitics are actually distributed along the wire, it is often feasible to lump them into a few elements. Commonly used lumped models are shown in Figure 2.11. The most basic model consists of one capacitance only, shown in Figure 2.11(a). It introduces the additional load on the driving cell and is applicable as long as the resistive component is insignificant and the switching frequencies are in the low and medium range. This is true for usually more than 90% of wires in a chip, and can be modeled by a single lumped capacitor. For wires of a certain length, the resistive component becomes significant. In this case, the model requires at least one resistor to appropriately consider the introduced RC effects. The L-model shown in Figure 2.11(b) is simple, but pessimistic wire wire wire wire wire 2.4. Deep Submicron Interconnects 23 wire wire wire wire wire wire wire wire wire wire wire wirewire (a) C only model wire (b) L-model wire wire wire wire wire wire (c) T-model wire wire wire wire wire (d) Π-model wire Figure 2.11.: Commonly used lumped wire models. wire in terms of the time constant τ = RC. As this value should rather be in the range 3 of τ = RC/2, this model is consequently inaccurate for long wires . In this context the T - and Π-models, where the resistance or the capacitance are divided into two elements are more accurate. Both models are suitable for calculations, but as the T -model has an additional node, which may increase the number of calculations, the Π-model is the most popular lumped RC model for long interconnect wires. Distributed Models Long interconnect wires show significant coupling effects, but lumped models do not take them properly into account. Distributed models lead to more accurate results in this case, but are on the other hand also more complex. Figure 2.12 shows a simplified model, where two nets are coupled by a capacitance to demonstrate the importance of considering coupling effects. In case the aggressor net is not switching, the coupling capacitance Cc could be considered as connected to ground. The loading capacitance CL in this case is CL = Cgnd + Cc . (2.16) On the other hand, if both nets are switching in the same direction, then CL = Cgnd (2.17) as no change of the coupling capacitance charge is caused. But if both nets are 3 A derivation and discussion of the time constant τ using the Elmore delay is extensively done in the respective literature, such as[15]. w wire 24 2. Digital Integrated Circuit Basics c gnd gnd Figure 2.12.: Internal structure of exemplary standard cells. switching in the opposite direction, the voltage at Cc is reversed and has to be considered twice: CL = Cgnd + 2Cc (2.18) This shows that including the coupling capacitances into a lumped model as connected to ground is only a first order approximation. And as there are typically many more coupled nets in a circuit, there are also numerous possible switching activity combinations, which can be covered only by distributed models and dedicated coupling capacitances. Since interconnects are typically part of a complex structure, the mentioned parameters actually vary along a wire. Given the case where a cell output is connected to several cells, the wire has a kind of tree structure and is furthermore possibly routed via different interconnect layers. As a result, the best accuracy would be achieved by modeling each section separately with the appropriate elements. Since this would again lead to excessively complex models, a compromise solution between accuracy and model complexity has to be found. This is typically done by introducing a threshold value for which capacitances should be considered, which usually results in sufficiently accurate models with a reasonable complexity. 2.5. Power Distribution Networks The power supply voltage is ideally a constant value all over a chip, but the distribution networks are also a kind of interconnect wire structures. As a consequence of the parasitic wire effects, the current consumption of the on-chip devices causes certain supply voltage fluctuations over time. Depending on the power grid structure and dimensions, as well as the circuit layout and the amount of current consumption, these fluctuations are furthermore possibly different at particular chip regions. The power distribution network design, but also the modeling of its behavior is therefore a potentially complex task. 2.5. Power Distribution Networks VDD 25 Lpkg Decap VSS Lpkg Figure 2.13.: Model of a power supply system considering the package inductance, the RLC characteristics of the on-chip power distribution network, as well as an instance of a decoupling capacitance. Power supply systems are designed to provide a preferably stable voltage at any point on a chip. But as it is distributed from an external source to all the particular components on the chip, voltage drops are caused by the parasitic wire characteristics. Figure 2.13 shows a power system model considering the most important resistive, capacitive, and inductive components. A popular measure to reduce voltage drops is the instantiation of decoupling capacitances. 2.5.1. Voltage Drops The primary reasons of power supply voltage drops are referred to as IR drop and L di/dt. IR drops are caused by the current flow and the wire resistances and are most important for the on-chip power grid. Inductive effects are on the other hand primarily caused by the interconnects at the chip package. The total voltage drop V is given by V = IR + L di . dt (2.19) IR drop With an increased complexity of integrated circuits also the current consumption increases due to additionally switching components. This leads to significant voltage drops caused by the resistance of the power supply lines. Such IR drops also exist at the ground grid, which are also referred to as ground bounce. 26 2. Digital Integrated Circuit Basics L di/dt A large number of simultaneously switching cells, which are in digital designs typically the clock buffers and flip-flop cells, possibly demand high current peaks with extremely short rise times and therefore a high di/dt. At least the package inductance may then contribute a significant portion to the overall voltage drop. Even the small inductances in the power grid may cause a considerable voltage drop in high-speed designs. The most significant inductances arise from the bonding wires used to connect the chip I/O pads to the lead frames of a traditional package. The inductances of a ball-grid array (BGA) package are one order of magnitude lower than the one of a dual inline package (DIP). But the current consumption of the typically much more complex systems in a BGA package causes nevertheless a significant L di/dt value for both VDD and VSS . 2.5.2. Decoupling Capacitances Large voltage drops in the power distribution system may possibly lead to supply voltages that temporarily fall below the minimum value required for a proper operation of a system. On-chip decoupling capacitances are commonly used to keep the supply voltage within the noise budget. These capacitances are typically located near the power pins of components demanding high peak currents. Decoupling capacitances hold a certain reservoir of charge. The needed current for the switching operations of nearby located cells is first delivered by these capacitances. Recharging them for the next operation is done later by the current flow from the power supply. As a result, decoupling capacitances operate as a kind of filter to reduce excessive di/dt rates, which are known as responsible for a significant part of the interfering voltage drops. As discussed in this chapter, there are several parasitic capacitances introduced by the MOS transistors and the interconnect wires. All of them limit the performance of a system, but all capacitances charged to VDD are effective as decoupling capacitances. The implementation of dedicated decoupling capacitances is usually done by NMOS transistors with the gate connected to VDD and the source and drain connected to VSS . 3. Synchronous Sequential Digital Systems The basic principle of digital circuits is that the signals ripple through paths of consecutively switching devices. To ensure correct values at a given node and time, the activity of a circuit must be coordinated. A general classification of digital systems is commonly done between synchronous and asynchronous designs [22, 23]. Synchronous systems are timed by a periodic clock signal. This time reference is globally distributed to all memory elements, which are intended to update their values simultaneously. The calculations of the internal values and the output values are done within the time interval between two synchronization events. Asynchronous systems are not stimulated by a globally distributed time reference signal. An often used term in this context is also self-timed systems. The operating sequences in such systems are coordinated by completion signals, which are generated by each function block, when the calculations are done, and the resulting values are stable. Given that all following operations, which rely on the results of a previous function, have to wait on this handshake signal, this ensures the correct logical order of the operating sequences in asynchronous designs. As virtually all of today’s digital systems are synchronous designs, this chapter is focussed on this isochronous type of circuits. In the following sections, the general principles and the most important design considerations of synchronous sequential systems are discussed. 3.1. General Principles Synchronous digital systems basically consist of clocked storage elements, which are triggered by the system clock, and the combinational logic, which is responsible for the determination of the internal values but also the resulting output signals. At any clock event, the storage elements are updated with these internal values, which consequently triggers the combinational logic again to determine the next results. This mode of operation is basically similar to the logical model of a finite-state 28 3. Synchronous Sequential Digital Systems inputs(X) combinational logic outputs(Y) Y=Y(X,Sn) clocked storage elements present state: Sn next state Sn+1 Sn+1=f(Sn,X) clock Figure 3.1.: General principle of a synchronous design as a finite-state machine. machine (FSM), as it is illustrated in Figure 3.1. The combinational logic takes the present state Sn and the input values X into account, and determines the output values Y and the next state Sn+1 . Therefore, both Sn+1 and Y depend on Sn and X: Sn+1 = f (Sn , X), Y = Y (Sn , X). The state transition from Sn to Sn+1 is initiated by the clock signal. This sequential procedure is schematically shown in Figure 3.2. On which event (rising or falling edge) the state transition is sensitive depends on the type of the instantiated storage elements. According to the most often implemented option, the circuit in this example is sensitive on the rising clock edge. It is shown that the combinational logic has a limited time to calculate the next state, or otherwise, it limits the maximum possible clock frequency. As the paths in the combinational logic Sn Xt comb. logic Yt Sn+1 Xt+1 comb. logic Yt+1 Sn+2 clock time Figure 3.2.: FSM state transitions related to the clock signal. 3.2. Digital System Clocking 29 block have different lengths, the resulting values are not simultaneously available. The propagation delay of the longest (slowest) path is termed critical path and represents a fundamental performance parameter of a system. A simple FSM model is feasible to demonstrate the basic principles of synchronous sequential systems, but the architecture of modern designs is much more complex, and discussed in the following sections. 3.2. Digital System Clocking The clock signal is the time reference for all operations, and therefore an essential part of a synchronous digital system. On the other hand, the clock distribution network and the storage elements are responsible for a significant portion of the energy consumption. The clock subsystem of modern microprocessors possibly consumes up to 40% of the entire chip power [24]. There are a number of methods to generate the clock signal and distribute it to the storage elements [25]. Given that the clock subsystem directly affects the performance of a system, its structure is application-specific and particularly optimized. Therefore, in the course of characterizing the behavior of a system, special care has to be taken on the characteristics of the implemented clocking methods. 3.2.1. System Clock Generation The internal system clock is typically derived from an external oscillator. While some systems are directly driven by the external clock, especially high-performance microprocessors operate at higher frequencies than the external reference. Figure 3.3 shows the main blocks of a typical clock generation unit. An internal phase-locked loop (PLL) is commonly used to synchronize an on-chip oscillator to the preconditioned external reference. By dividing the internal frequency f PLL by a given factor prior to the synchronization, a supplemental multiplication of the fCPU ext. clock clock prescaler fOSC PLL fPLL clock divider fSYS configuration register Figure 3.3.: Basic blocks of a clock generator unit. 30 3. Synchronous Sequential Digital Systems reference frequency f OSC is performed. Given that a central processing unit (CPU) requires typically a higher clock frequency than peripheral units, a configurable clock divider is used to generate multiple clock signals with different frequencies. The configuration of the clock generation unit in terms of multiplier/divider factors and prescaler options is typically done by setting the appropriate values in a configuration register. For the characterization of a system in form of the transient current consumption, the configuration of possibly different clock signals is important. Dividing a clock signal by a given factor k is often done by disabling k periods of the faster signal between two passed periods. This leads to different ratios of the high- and lowphases of the resulting slower clock, which is commonly termed duty cycle and of significant importance for the current consumption modeling in Chapter 5. 3.2.2. Clock Distribution The intention of synchronous designs is that all storage elements are triggered simultaneously by the global system clock. This requires a specifically designed distribution network, which assures the same clock signal propagation delay from the generation unit to all storage elements. Figure 3.4 schematically shows a basic approach for a balanced distribution network. The clock signal is routed to a central point on the chip and distributed over a H-tree structure to a number of functional sub-blocks. The local distribution to the particular storage elements is done by balanced paths as well. Such approaches are based on the assumption that the wire lengths from the central point to all leaf nodes are constant. As complex designs are typically irregular, and different metal layers with different wire dimensions are used, the approach of a strict H-tree is usually not directly applicable [26]. Modern electronic design automation (EDA) tools support the implementation of resistance-capacitance (RC) matched trees. This method considers the parasitic effects of the interconnect wires and provides the opportunity to adjust clock buffer strengths. The crucial feature of this approach is the possibility to compensate propagation delays. This allows the implementation of irregularly routed, but nevertheless balanced, clock distribution networks. A popular approach for implementing a balanced tree for an irregularly structured design is the clustering method. There, the clocked storage elements are recursively grouped into clusters. Each cluster consists of one buffer, which distributes the clock signal to the associated sub-segments. The balance of the tree is accomplished by applying buffers with the appropriate strength and constant wire lengths within each cluster. The absolute signal propagation delay of a clock distribution network is usually irrelevant, as long as the arrival times at the storage elements are preferably similar. Given that path propagation delays in real systems are never exactly the same, the maximum difference between the arrival times is an important reliability parameter 3.2. Digital System Clocking 31 clock clock clock clock (a) physical view (b) tree view Figure 3.4.: H-tree clock distribution network. of a system. It is usually termed clock skew and describes the time interval in which the clocked storage elements are triggered. With respect to the system performance characteristics, the clock skew reduces the effective clock period. Therefore, the available time for calculations of the combinational logic is shortened. On the other hand, this effect diffuses the activity of the system and distributes the current consumption over a certain time interval. As this leads to a lower current peak value, it possibly eases the power distribution network dimensioning. Additional advantages of a distributed activity are lower voltage drops in the power supply system and a reduction of the electromagnetic emission. As a result, an adequate clock skew, or also termed jitter, is intentionally induced in some cases [27, 28]. 3.2.3. Clock Gating The energy consumption of a system is, especially for mobile devices, of particular interest. Distributing the clock signal to all storage elements, even if their data values remain unchanged, causes unnecessary switching activities, and therefore an inappropriate power dissipation. A popular technique to avoid unintended circuit activities is clock gating [29, 30]. By inserting gates into the clock tree, it provides the opportunity to prevent the distribution of the clock signal to the actually inactive functional sub-blocks. As a result, the idle parts of a circuit can be selectively deactivated. Complex systems, such as microcontrollers, provide a large number of functionalities and multiple peripheral units. In programmable systems, it finally depends on the implemented software, which functional units are actually occupied and active. Under the assumption that the microcontroller activity can be possibly almost idle, while its clock subsystem would be active in any case, clock gating is an efficient 32 3. Synchronous Sequential Digital Systems measure to optimize the power dissipation of a system. As the clock gates itself, and particularly the additionally needed control logic, occupy chip area and consume energy as well, it is a compromise solution of how many gates are inserted. It depends on the application, but some of the functionalities of a microcontroller are only optionally, rarely, or partially used. In this context, it is usually differentiated between static and dynamic gating: Static gating is usually applied in systems with different functional units, which are only optionally used. The enable signals are typically generated by a system control unit, and cause entire units to enter a sleep mode. Since deactivating the clock for an entire module prevents the dynamic current flow, but not the static leakage current, some systems support the additional functionality to switch off the power supply of particular units. Dynamic gating is typically done by individually setting the enable signals of the clock gates at each clock cycle. An internal gating logic determines if a particular functionality is required and enables or disables the appropriate clock gate(s). Due to the additionally needed control logic, this concept is more expensive than static gating, but it allows a selective deactivation of local resources. At the lowest level, the clock signal distribution to particular registers, or even single flip-flops, can be restricted. As statically gated functional units can be considered as completely switched off, dynamic gating is most challenging at system activity simulations. Figure 3.5 shows for instance a fragment of a clock tree with inserted gating cells. The clock signal CLK is distributed through a tree of buffers, and the enable signals are routed to the inserted clock gates. These signals can be generated by a local control logic, or, even in case of dynamic gating, globally distributed to several submodules. It is EN1 EN1 CLK CLK EN2A EN2 EN2B EN2 EN2A Figure 3.5.: Clock tree with inserted gating cells. 3.3. Combinational Logic Cells 33 also shown that gating is done hierarchically to provide the opportunity to disable either entire modules, functional sub-blocks, or even single registers or flip-flops. In the strict sense, the enable signal of flip-flops (discussed later in Section 3.4) is also a kind of gating. Often used terms in this context are global and local gating, where local gating is integrated into clocked storage elements, while global gating effects a certain number of elements. Since peripheral units of a microcontroller possibly operate at a fraction of the system frequency, clock gating is often used to implement a local clock divider. The circuit in Figure 3.6 shows an implementation of a divider which may be used to divide the system clock by a factor of two. Setting the signal DIV to low causes the flip-flop to toggle its output value at any rising edge of CLK. As the latch is transparent in the low-phases of the system clock, the value of CLKen changes only at the low-phases of the clock signal and is in any case stable at the high-phases. This is important, as CLKi should be preferably synchronous to the system clock. A high level of DIV disables the toggle flip-flop, which leads to a constant value at the inverter and consequently a permanently ”open” AND gate. Without the inverter, the AND gate would be permanently ”locked” in this case. DIV flip‐flop D CP QN latch D ENN CLK Q CLKen CLKi Figure 3.6.: Schematic of a basic local clock divider. 3.3. Combinational Logic Cells As mentioned in Section 3.1, the combinational part of a circuit is responsible for the determination of internal results and output values. The output state of a combinational circuit is at any point in time related to its current input signals by a boolean expression. There are consequently no instances of storage elements and typically no combinational feedback loops. 34 3. Synchronous Sequential Digital Systems The building blocks of combinational logic circuits are the combinational logic cells, also termed gates. These gates provide basic logic or arithmetic functions, which can be cascaded to the intentioned circuit function. Figure 3.7 shows a symbol of a general logic cell, which may have an arbitrary number of inputs and outputs, depending on the implemented function. This can be a simple inverter or buffer with one input and one output, but also an arithmetic function with several inputs and possibly more than one output. All output values of combinational cells are in any case instantly calculated and provided at the respective output(s). I1 O1 I2 IM ON On = f (I1 ..IM ) Figure 3.7.: Symbol and truth-table of a general combinational logic cell. 3.4. Clocked Storage Devices Storage elements are generally used to store the state of a system, as discussed in Section 3.1. This can be intermediate results in flip-flops or registers, but also a set of data vectors in a memory. Both types of information are intended to be subsequently processed, and therefore considered as the system state. According to the definition of synchronous sequential circuits, clocked storage elements represent the leaf nodes of the clock distribution network. A common characteristic of all clocked elements is that the point in time, when the input is captured, is given by the clock signal. The differentiation between the types of elements is based on the specific sensitivity on the clock signal and the amount of stored data. While level-sensitive devices (capture the input continuously during the enabled phase) are commonly termed latches, edge-sensitive devices (triggered by a clock signal transition) are called flip-flops. A group of a given number of flip-flops in parallel is usually intended to handle the values of data buses and is typically called a register. The basic characteristic of a memory is the ability to store more than one data word (a number of parallel bits), where the location in the memory is selected by the application of the appropriate address at the respective input. As the functionality of clocked storage devices is essentially important for the operating behavior of synchronous designs, the different device types are briefly discussed in the following sections. By the reason that the simulations discussed 3.4. Clocked Storage Devices 35 in the following chapters are done to characterize the behavior of already verified circuits, it can be assumed that the timing requirements are already fulfilled. Therefore, and since the switching behavior is important for the characterization in Chapter 4, the focus is here on the functionality and special properties of the respective devices. 3.4.1. Latches There are different types of latches, but the D-latch shown in Figure 3.8 is most relevant for synchronous designs. Latches are generally level-sensitive. As shown in the truth-table, the output Q directly follows the input D as long as it is enabled. An in this context often used term is that an enabled latch is transparent. Figure 3.8 shows the symbol and the truth-table of a high-active latch, which is enabled while EN is ’1’. At low-active latches, as it is instantiated for the clockdivider in Figure 3.6, this port is usually labeled EN N or EN . The transparent behavior of latches may be useful for several applications, such as the mentioned clock divider, but is problematic in synchronous designs. By the reason that the input data signals are continuously propagated to the output, latches cannot be considered as a terminator of combinational paths. As a result, D-latches are only rarely instantiated for special purposes in synchronous designs. D EN Q D – – EN 1 0 Q D Qt−1 Figure 3.8.: D-latch symbol and truth-table. 3.4.2. Flip-Flops The predominantly instantiated clocked storage elements are flip-flops. These edgesensitive devices capture the input data signal at discrete points in time, and propagate it consequently to the output. Flip-flops are most commonly sensitive on the rising edge (posedge-triggered) of the clock signal, but modified types may also be sensitive on falling edges (negedge-triggered). Basic versions typically consist of the ports CP (clock), D (data input), and Q (data output). As shown in Figure 3.9, the output states are set to the input value at any rising edge of the clock signal. To provide additional features, flip-flops are often extended by one or more of the following functionalities: 36 3. Synchronous Sequential Digital Systems D Q D – CP CP ↑ Q D Figure 3.9.: Basic flip-flop symbol and truth-table. enable: An often used feature is provided by an additional EN (enable) port. It is used to prevent the update of the stored data, even if the triggering clock event occurs. It causes the flip-flop to feed the output value back to the input. This results in the effect, that a disabled flip-flop consequently captures the current output value instead of the actual input data. set/reset: It is in some cases necessary to bring a system into a well-defined state, such as the power-on state of a given system. These two signals force the flipflop output to either a high (set) or low (reset) state. Given that set/reset in ASIC designs are most commonly low-active (become effective when set to ’0’), the ports are typically named SN and RN , or R and S. On the other hand, S or R indicate a high-active set or reset port in this context. scan test: After the fabrication of chip, several tests are done. One of the functional tests is the so called scan test, where the circuit is reconfigured to form scan chains. In the course of this test, flip-flops operate in a different mode, and therefore provide two additional ports; T E (test enable) and T I (test input). These ports are typically used for the test procedure only, and are therefore less important during the normal operation of a circuit. Figure 3.10 shows the symbol and the truth table of a flip-flop with providing the functionality of an enable and a reset port. The truth table shows the relation of the output to the different input states. As the shown device is posedge-triggered, it captures the input value at any rising clock edge, except for the case where it is disabled. It is also shown that the reset signal is typically asynchronous, i.e. becomes instantly effective and is consequently independent from the clock signal. 3.4.3. Memories Many digital designs, but at least programmable devices such as a microcontroller, possibly consist of internal memories. In addition to a read-only memory (ROM) with the basic software routines, several static random-access memories (SRAMs) are possibly instantiated. Given that ROMs and SRAMs are typically sensitive on the rising clock edge, static memories are a popular kind of posedge-triggered clocked storage devices. 3.4. Clocked Storage Devices D 37 Q D – – – CP EN EN 0 1 – CP ↑ ↑ – RN 1 1 0 Q Qt−1 D 0 RN Figure 3.10.: Extended flip-flop symbol and truth-table. Figure 3.11 shows a symbol and the respective pin description, and Figure 3.12 the block diagram of the internal structure of an SRAM that is customary instantiated in VLSI designs. An internal memory array is used to store a given number of data words. The term random access in this context is derived from the characteristic that data words can be read and written in a random order at any successive clock cycles. An address decoder converts the binary coded address vector with a length of m bits to the internally used signals for the data word selection. The I/O buffer latches the input and output data vectors, where each of them consists of n bits. A D CLK WEN CEN Q Pin Description A[m-1:0] D[n-1:0] Q[n-1:0] CLK CEN WEN Address Data Input Data Output Clock Chip Enable Write Enable Figure 3.11.: Symbol and pin descriptions of a general memory instance. The internal activity of the shown memory is coordinated by a control unit. It generates the internal control signals for the address decoder and the data buffer. As a kind of local clock gating, the chip enable signal is used to control the internal clock signal propagation. Given that CEN is typically low-active, this type of SRAM performs a read or write operation at each clock cycle, as long as the enable signal is low. On the other hand, a high CEN signal causes the memory to ignore all incoming requests. The write enable signal is used to distinguish between the read and the write mode of the device. In the write mode, the memory is forced to store the applied input data vector D at the given address A in the memory array, and to additionally propagate the data to the output Q. In the read mode, the addressed data word is read from the memory array and provided at the data 3. Synchronous Sequential Digital Systems A[m‐1:0] CLK CEN WEN D[n‐1:0] address decoder 38 control unit memory array data I/O buffer Q[n‐1:0] Figure 3.12.: Basic block diagram of a static memory. output after a certain delay. Alternative SRAM implementations feature extended write enable signals in form of a write mask. This provides the opportunity to selectively write a segment, or possibly even single bits of the data input vector into the memory. The timing diagram in Figure 3.13 shows the common dependencies of the SRAM activity during read and write operations. Three clock cycles are shown, where the following operations are triggered by the rising edges of the clock signal: 1. A low chip enable and high write enable signal at the rising edge of the clock forces the memory to read the stored data at the address A1. The values are propagated to the output and available after a certain delay. The applied input data vector D1 is ignored. 2. Given that a high chip enable signal disables the device, none of the applied signals is recognized at the second rising clock edge. The output data remain unchanged until the next read or write operation is executed. 3. As both CEN and W EN are low at the third clock cycle, a write operation is initiated at the clock edge. Therefore, the data input values D2 are captured and stored in the memory array at the given address A2. After a certain delay, the data D2 are additionally propagated to the output. As mentioned in Chapter 2, SRAMs are typically instantiated as a macro cell which is usually compiled by a so-called memory compiler. The basic structure of 3.4. Clocked Storage Devices 39 CLK CEN WEN A A1 A2 D D1 D2 Q Q1=D(A1) Q2=D2 Figure 3.13.: SRAM timing diagram for one read and one write operation. preferably all memories in a design are therefore quite similar, while the data vector width (n) and the address range (m) are adapted for the respective application. In addition to SRAMs, digital systems consist in some cases also of ROM instances. Even though the data in the memory array of such ROM devices is unchangeable, the basic structure and operating method is similar to SRAMs. By the reason that ROMs are read-only devices, there are no data input and write enable signals provided. But the timing characteristics shown for the read operation of an SRAM in Figure 3.13 are finally also valid for ROMs. 4. Standard Cell Characterization Modern digital integrated circuits are often quite complex and probably consist of several millions of transistors. If possible at all, simulations to characterize the behavior of large designs at transistor level demand excessive computational power and memory. This chapter introduces a novel method to significantly reduce the simulation effort by pre-characterizing the cells of a standard cell library. Based on the analysis of the static and dynamic behavior of CMOS devices, the extraction of important parameters which enable fast current profile calculations, is discussed. The finally resulting library of pre-characterized cells is the basis of an efficient gate-level circuit analysis, such as the current profile determination discussed in Chapter 5. 4.1. Outline of the Methodology The approach of the characterization procedure is based on single cells simulations under different operating conditions, as well as on the extraction of a set of characteristic parameters, and on storing the results in a specific library (Figure 4.1). The starting basis are the transistor level cell descriptions. Analog circuit simulations using BSIM (Berkeley Short-channel IGFET Mode) transistor models are performed to determine the behavior of each single cell under different operating conditions. The simulation results are analyzed, and a set of parameters, describing the timing characteristics as well as the current consumption, are extracted. Storing these results in a specific library provides a description of the dynamic behavior for subsequent simulations. In the aim of enabling gate-level simulations, additional parameters like port-definitions and the logic cell-functions are provided in form of static cell properties. The characterization has to be done only once per technology library. As the standard cells can be considered as behavioral black boxes during the simulation of entire modules, this strategy provides an enormous speed-up compared to traditional transistor level simulations. A further advantage of this method is that most of the processes can be easily automated. This is advantageous, since standard cell libraries typically consist of several hundred different cells. This procedure has been applied for a 130 nm technology, but has also been approved for a 90 nm library. dynamic cell behavior 42 4. Standard Cell Characterization transistor level cell descriptions analog single cell circuit simulations characteristic parameter extraction timing characteristics static cell properties current profile characteristics reference current profiles dynamic cell behavior Figure 4.1.: Flow of the standard cell characterization procedure. 4.2. Dynamic Cell Behavior Considerations It is very important to analyze the behavior of standard cells in terms of minimizing the characterization effort as well as the required amount of data to be able to reconstruct the timing characteristics and current profiles efficiently. At a first glance, the current consumption depends particularly on the cell activity. There is a significant difference if an input event affects only internal nodes, or if an additional output state transition occurs. Figure 4.2 shows the transistor level VDD VDD IDD VA IA Z NAND2 VB A IZ VZ IB ISS B VSS VSS Figure 4.2.: Symbol and transistor level schematic of a NAND gate with 2 inputs. 4.2. Dynamic Cell Behavior Considerations 43 schematic view and a symbol with the electrical characteristics at the terminals of a NAND gate. As the cells are characterized as single devices, but interact with the surrounding cells in the final circuit, the behavior of each terminal is important. In the face of analyzing the current consumption of an entire system, IDD and ISS are supposed to be equal. But in the course of characterizing cells as single devices, there are significant differences. The plots in Figure 4.3 show the current profiles of the NAND gate shown above for all activities that may be triggered by an event at port A. The profiles for activities at port B are similar. It can be seen that the current consumption highly depends on the actual port activity. Input events that effect an internal node, but where the output remains stable, show only minor current consumption, as plotted in Figure 4.3(a) and (c). On the other hand, the current peaks caused by events −4 −4 x 10 x 10 6 IDD ISS 4 current [A] current [A] 6 2 0 −2 2 0 0 100 200 time [ps] 300 −2 400 0 100 200 time [ps] 300 400 (b) port A: rising edge, port B: high −4 −4 x 10 x 10 6 6 IDD ISS 4 current [A] current [A] ISS 4 (a) port A: rising edge, port B: low 2 0 −2 IDD IDD ISS 4 2 0 0 100 200 time [ps] 300 (c) port A: falling edge, port B: low 400 −2 0 100 200 time [ps] 300 400 (d) port A: falling edge, port B: high Figure 4.3.: Current profiles for different cell activities. with an output state change are significant and shown in (b) and (d). It is also shown that there are possibly significant differences between IDD and ISS . While the current IZ at a low-high transition of the output (Plot (d)) is primarily part of IDD , a high-low transition (Plot (b)) directly effects ISS . Even though the current profiles are significantly different at one single cell, IDD and ISS of an entire system 44 4. Standard Cell Characterization have to be equal. This is given by the reason that IZ is also part of IDD and ISS of the cells that are connected to the output, and the input currents IA and IB are also existent in the same way at the driving cell. As the internal structure of a cell actually consists of a network of parasitic capacitances, further considerations are necessary. In case the NAND gate shown in Figure 4.2 is triggered by a rising edge at port B, while port A is still high, the output potential is pulled towards VSS . As a consequence, the drain potential of both transistors connected to port A changes by the full output voltage swing, and the according gate-capacitances have to be charged/discharged by the same value as well. The associated current flow IA is in this case not caused by an active event at port A and therefore not yet considered as part of the switching current of any actively driving cell. A challenge in characterizing the dynamic behavior of a cell is that the operating performance highly depends on the circuit environment. The dynamic behavior of a CMOS circuit is an interrelation of interconnected devices. Signal slew rates are the result of interactions of driver strengths and driven loads (fan-in of the cells connected to the output). With the intention of characterizing the switching behavior of standard cells, the important parameters are thus the input signal slope and the driven output load. Figure 4.4 shows the data signal voltage and the according current profiles of a buffer (two subsequent inverters), which is triggered by a rising edge, but driving different loads. The loads are represented by inverters with different transistor sizes. It can be shown that higher loads lead to an increased current flow and decreased signal slew rates. 1.6 1 1.4 0.8 0.6 1 IDD [mA] data signals [V] 1.2 0.8 0.6 0.4 0.2 0.4 VIN 0.2 0 VOUT 0 0 50 100 150 time [ps] (a) data signals 200 250 −0.2 0 50 100 150 time [ps] 200 250 (b) current profiles Figure 4.4.: Data signals and current profiles of a buffer for different loads. Higher loads cause a higher current flow and reduced output signal slew rates. 4.3. Equivalent Inverter 45 As different loads lead to different signal slopes, and the output signals of one cell are equivalent to the input signals of the next one, different input signal slew rates have to be considered as well. This effect is most significant at single-staged cells, where transistor terminals are directly connected to both, input and output. The plots in Figure 4.5 show these effects at an inverter that is triggered by falling edges with different signal slopes. The output load in this example is supposed to be constant. It can be observed that reduced signal slew rates lead to also reduced current profile slopes and furthermore to considerably lower peak values. Even though the effect is minor, the output signals are also affected due to the coupling effects caused by the internal parasitic capacitances. 1.6 1 1.4 0.8 0.6 1 0.8 VIN 0.6 VOUT IDD [mA] data signals [V] 1.2 0.4 0.2 0.4 0.2 0 0 0 50 100 150 time [ps] 200 (a) data signals 250 −0.2 0 50 100 150 time [ps] 200 250 (b) current profiles Figure 4.5.: Data signals and current profiles for different input signal slopes. Reduced input signal slew rates lead to reduced peak current values. 4.3. Equivalent Inverter Tools for placing and routing a design have the ability to connect almost any cells of a library to each other. This results in a virtually unmanageable number of combinations. As a consequence, it is not reasonable to simulate each cell with all possible configurations. A reduction to a moderate set of environmental conditions is necessary. This requires a substitution of the possible neighbor cells (driver, parallel cells, output load) by an equivalent circuit that is able to cover all important conditions. One solution in this context is the introduction of parameterized equivalent inverters. There are several publications dealing with approaches to model the timing behavior of CMOS gates [31, 32, 33]. It is demonstrated that the behav- 46 4. Standard Cell Characterization ior can be modeled by an appropriately configured equivalent inverter. The gate structures, including the parasitic effects of the transistors, are analyzed and substituted by inverters with similar characteristics. These approaches promise that the transient behavior of a cell can be modeled with an accuracy of a few percent deviation, but do not account the current consumption as a time-domain profile. As discussed in Chapter 2, the gate capacitance of a transistor is actually nonlinear and depends on the operating region, but is almost proportional to the channel dimensions. Assuming that the channel lengths of all transistors of one technology library are constant, the capacitances are directly proportional to the channel width. In terms of modeling the capacitive load, it is consequently legal to replace two or more parallel transistors by one transistor with the appropriate channel width. It is therefore possible to combine several parallel inverters to one equivalent inverter. Granted that all transistors have a similar gate oxide, also the ratios of the widths of the PMOS (WP ) and the NMOS (WN ) transistors are insignificant. As a result, the dimensions of the channel width of both equivalent transistors (WP e , WN e ) can be simplified by setting them to their average value: WP e = WN e = X WP + WN 2 (4.1) While the gate widths of parallel inverter structures can be safely added and averaged without interfering the transient behavior, the substitution of combinational gates like NAND or NOR (Figure 4.6) requires a more detailed analysis. During a low-high or high-low transition of an inverter, the input and output voltages are VDD VDD A Z B A Z B VSS (a) NAND VSS (b) NOR Figure 4.6.: Transistor level schematic view of NAND and NOR gates. 4.3. Equivalent Inverter 47 pulled in the opposite direction. This means that the transistor gate and drain potentials change their values, while the source terminals are always tied to VDD or VSS . Hence, the gate-source capacitances are charged/discharged by the actual input voltage swing (∆VGS = ∆Vin ). As the output moves in the opposite direction of the input, the voltage change over the gate-drain capacitances is twice the input/output voltage swing (∆VGD = ∆Vin + ∆Vout ). In the cases where transistors of the same type are in series, like PMOS at NOR and NMOS at NAND gates, the source terminals of the outer transistors are tied to VDD or VSS , but the node between them (Vx ) may be at an intermediate potential. On the other hand, if a transistor is switched on, the drain-source voltage (VDS ) of all parallel transistors is virtually zero. As a consequence, even if its gate voltage is changed, the drain and source potentials remain almost unchanged. This results in a significantly different current consumption and also affects the output signal slope compared to the behavior of an inverter. Assuming that the load of a given cell is a NAND gate, the plots in Figure 4.7 demonstrate that the output current depends on its activity. The solid lines represent the cases where an output state transition of the NAND gate occurs. The behavior in this case is similar to the previously mentioned inverter, and the variation between the curves for the triggered ports A and B are consequently minor. This can be accepted, as the total current is in both cases similar and the effects on the output signal slew rate are within a tolerable range of a few picoseconds. 0.16 0.02 A↑, B high A↑, B low B↑, A high B↑, A low 0.14 0.12 −0.02 −0.04 IZ [mA] IZ [mA] 0.1 0 0.08 0.06 −0.06 −0.08 0.04 −0.1 0.02 −0.12 0 −0.14 −0.02 0 50 100 time [ps] (a) rising edge 150 200 −0.16 A↓, B high A↓, B low B↓, A high B↓, A low 0 50 100 time [ps] 150 200 (b) falling edge Figure 4.7.: Output current profiles for a NAND-type load. More detailed considerations are advisable if one of the inputs remains low (dashed lines). The output state, and therefore all drain and source terminals of the parallel PMOS transistors, remain virtually at VDD . The drain-source volt- 48 4. Standard Cell Characterization age swing consequently is ∆VDSp = 0. This leads to a behavior of the transistors that is similar to a MOS capacitor. The voltage swing of the node between the NMOS transistors is limited to ∆VDSn = VDD /2. Extrapolating these issues to a NAND structure with N ports leads to the result, that ∆VDSp = 0 is valid in all cases where one port remains low. The voltage swing of inner nodes is therefore generally ∆VDSn = VDD /NL , where NL is the number of ports in the low state. A NOR gate is complementary in this respect. The relations for ∆VDS for an N -port NOR are simply contrariwise: ∆VDSp = VDD /NH , where NH is the number of high ports, and ∆VDSn = 0 if at least one port is high. If all inactive ports of a NOR gate are low, the behavior is analog to a NAND gate where all inactive ports are high; like an inverter with the corresponding transistor sizes. Taking all the previously mentioned considerations into account leads to the result, that an equivalent inverter in combination with two MOS capacitors provides all necessary contributions to sufficiently cover the behavior of any combinational CMOS gate. In this context the capacitors are implemented similar to an inverter where the drain terminals are not connected together, but to the appropriate source potential (Figure 4.8). These two equivalent devices provide the required possibility to model the different circuit environments for the characterization procedure. VDD VDD Vin Vout VSS (a) inverter A VSS (b) capacitors Figure 4.8.: Schematic view of equivalent devices. 4.4. Single Cell Simulations As introduced at the beginning of this chapter, single cell simulations are performed to determine the dynamic behavior of all cells in a library. Each single cell is simulated under different environmental conditions using a SPICE based analog circuit simulator. This includes the relevant activities of each cell and the effects of applicable circuit environments. The characterization of a complete library can result in an extensive effort, as it possibly consists of several hundred different cells. 4.4. Single Cell Simulations 49 It is therefore advantageous to accomplish a method which allows a preferably high grade of automation and a possibility to consider cells independent of their function as black boxes. 4.4.1. Simulation Environment Due to parasitic coupling effects the behavior of a cell theoretically depends on the configuration of the entire circuit. Signal slopes are a product of interrelations of driver strengths and output loads. As mentioned in Section 4.3, load cells can be substituted by appropriately configured equivalent devices. The focus on modeling the input signal slope is on covering a reasonable range of slew rates. Figure 4.9 for instance shows the finally established characterization simulation circuit for a combinational gate with two inputs and one output. The cell under characterization is embedded in a circuit of parameterizable equivalent devices, which allows the emulation of different environmental conditions. equIV equIV parA VstmA driverA equIV equIV equIV load1 load2 VDD IDD IINA IINB NAND2 IOUT equCAP ISS equIV VstmB driverB VSS load parB Figure 4.9.: Characterization simulation circuit for a two-input NAND gate. The circuit is stimulated by two voltage sources (VstmA , VstmB ), according to the number of inputs. These sources are used to trigger all relevant activities during the simulation. Because it is inevitable that pulsed sources change their values abruptly, several subsequent buffer-cells are used to form realistic signal slopes. Otherwise, due to the capacitive loads, sharp-edged voltage changes would result in unrealistic current peaks. The equivalent inverters driver and par are used to set 50 4. Standard Cell Characterization up the intended input signal characteristics. While larger driving transistor sizes increase the signal slew rates, an appropriate parallel load provides in turn the ability to reduce the slew rates. Verifications of the approach to model the output load with equivalent devices have shown, that the most efficient way is to perform two parallel simulations: one of them with equivalent capacitances and the other one with two successive inverters (load1, load2). This is used to consider the capacitive coupling effects of the gate to the drain (CGD ) and source (CGS ) terminals of load1, since the gate capacitance of load2 consequently effects the output behavior of the characterized cell as well. 4.4.2. Parameter Variation As it is unfeasible to simulate all possible circuit configurations, a representative set of parameters needs to be appointed. The simulations must cover a wide range of conditions and enable an accurate interpolation of the results. As mentioned before, the total current consumption, as well as the timing characteristics, are almost proportional to the transistor sizes over the period of a transition. This leads to the initial approach that it is sufficient to consider each equivalent device (shown in Figure 4.9) with two different configurations. All other conditions can in a good approximation be determined by a linear interpolation of these simulation results. The number of simulations is then already manageable and the accuracy is acceptable in most cases. Considering that the current peak values are not directly proportional to the load size, a third run per equivalent device is necessary. This enables an appropriate peak-value interpolation and leads to more accurate results. On the other hand, the number of simulations Nsim = (Ndriver · Nparallel )Nports · Nload1 · Nload2 (4.2) is still very high, as it increases by the factor of Ndriver · Nparallel = 9 for each additional port. Therefore, it is reasonable to analyze the effect of the single parameters more detailed and consider them appropriately: Output loads show the most diversified effects on the cell behavior. The directly connected equivalent device (inverter load1 or capacitor load) has to be modeled as mentioned in the previous sections and alternated three times. The second inverter (load2) has to be considered, but a linear interpolation of two scenarios is, due to the marginal effect, feasible. Driver strengths are used to adjust signal slew rates. Large transistors driving the inputs of a characterized cell lead to simulations with fast rising and falling edges. One approach to reduce the number of varied parameters is, to set the drivers of all inputs to the same value. Given that the simulations of all 4.4. Single Cell Simulations 51 active events are already done with all relevant circuit configurations, this decision is applicable. A current flow as a consequence of charging/discharging internal nodes, while the appropriate port state remains stable, is virtually independent from the driver strength. These considerations are similar to the leakage current discussed in Chapter 2. Parallel loads are used in combination with the fan-in of the characterized cell to reduce the input signal slew rate. High parallel loads lead to slow rising/falling edges, even if the driving cell strength is very high. Due to the fact that the effects of the driver strength and the parallel load are complementary, the variation of one of them is sufficient under normal circumstances. It can be shown that the application of a relatively strong driver, compared to the fan-in of the characterized cell, and the alternation of the parallel load, covers the conditions in real circuits in a good fashion. The applied parallel cell is therefore alternated to analyze the scenarios where it is not present, similar to the default load of the given driver, and the double size of the default load. Taking all these considerations into account, the number of characterization simulations per cell can be reduced to: Nsim = Ndriver · Nparallel · Nload1 · Nload2 . (4.3) In case only one driver strength is applied, an effort of Nsim = 18 can be finally achieved for any kind of cell in the library. 4.4.3. Simulation Stimuli For a comprehensive characterization of a cell it is necessary to trigger all relevant activities in course of the simulations. While the behavior of combinational logic gates is sufficiently determined by executing all possible input signal transitions, the behavior of clocked storage elements, such as flip-flops or latches, additionally depends on the output state. This becomes evident by comparing the activity of a clock-triggered flip-flop, where the input data value leads to an output transition, and the case where the appropriate value is already stored. The activity, and therefore also the current consumption is significantly different for these two situations. Due to optimization reasons, the different cell types are considered specifically. Combinational Gates The number of possible activities of combinational gates Ncomb solely depends on the number of inputs. A gate with m inputs features 2m different states. Considering 52 4. Standard Cell Characterization that each state can be the result of a transition at any of the inputs, the number of possible activities is consequently given by Ncomb = m · 2m (4.4) This is manageable for gates with low pin counts. Most cells in a library have two to four ports, but especially complex gates have up to six or more ports. Without further considerations, the simulation effort and the resulting amount of data would be exceedingly high. On the other hand, symmetric structures allow a reduction to an acceptable number of analyzed transitions due to several similar port behaviors. Latches This cell-type combines the features of combinational gates and storage cells. It distributes the input signal directly to the output in the transparent mode, or stores the output state otherwise. This means that the output activity additionally depends on the previous state. The number of possible different transitions Nlatch for m ports is given by Nlatch = m · 2m+1 (4.5) Latches typically have two inputs (data in and enable), but there may exist versions with an additional reset port. Granted that the circuit simulations are limited to the normal circuit operation, omitting the system reset condition, this port is permanently set to its inactive level. As a result, the number of simulated transitions is constant (Nlatch = 16) for all kinds of latches. Flip-Flops The output state of a flip-flop only changes at a specified clock edge. This means contrariwise that a data value transition effects in any case only internal nodes. But, due to the internal feedback of the output state, the dynamic behavior during a data value transition is considerably different. This leads to the finding that the number of transitions for flip-flops Nf f with m inputs is equal to latches: Nf f = m · 2m+1 (4.6) Standard flip-flops typically have the ports data in, clock, and enable. Special versions additionally support a scan-test mode and have two additional ports (scan enable and scan input). Other commonly provided ports are the inputs for the set and reset features. As a result, a full-featured flip-flop may have up to seven inputs. This would result in Nf f = 7 · 28 . By a concentration on the normal circuit operations, all the special ports can be set to an inactive state. As a result, the number of triggered events can be significantly reduced to Nf f = 16 for basic versions without enable ports. 4.5. Characteristic Parameter Extraction 53 4.5. Characteristic Parameter Extraction The results of the single cell simulations provide the information concerning the behavior in terms of timing characteristics and current consumption. For a determination of the behavior of a cell in different circuit configurations, an interpolation of the simulated conditions is required. The characteristic parameters are therefore extracted out of these data, which can be consequently interpolated over a wide range of environmental conditions. In addition, this procedure significantly reduces the amount of finally stored data in the library of characterized cells. 4.5.1. Timing Characteristics One basic prerequisite for the determination of a circuit behavior is the identification of the signal timing. Essential parameters in this context are the input and output transition characteristics, as well as the propagation delay. PMOS and NMOS transistors of standard cells are typically dimensioned to have almost similar timing properties for rising and falling edges. But especially cells with the intention to distribute the system clock signal, are often optimized for a preferably fast distribution of the rising clock edge. Hence, it is important to distinguish between the timing of rising (tr , tLH ) and falling (tf , tHL ) edges, as well as the propagation delays tpLH and tpHL . Due to the various dependencies of the cell behavior, as discussed in Section 4.2, all timing parameters have to be determined for all simulated conditions. The plots in Figure 4.10 for instance show a comparison of several possible input and output signal characteristics of an inverter. All signals are therefore aligned at the 50% point of the input voltage Vin . As the propagation delay is commonly defined as the time between the 50% points of the input and output voltage swing, it is directly observable along the line at ∆V /2. Storing these parameters in the library of characterized cells, provides the possibility to determine the actual timing-behavior of a cell for a given circuit configuration by a simple interpolation of these values. In addition to the timing characteristics, the plots in Figure 4.10 also give a good overview of the coupling effects on a signal transition. It is shown that each signal shows a slight overshoot prior to any transition. Given that capacitances cannot be charged/discharged infinitively fast, the output initially follows the input voltage. It can be seen, that the overshoot amplitude increases with the signal slew rate. On the other hand, the last part of a transition depends on the output load characteristics. Due to the subsequently triggered activity of a cell connected to the output, the final transition progress is affected. These effects are less important for the timing, but considerable at the current profiles discussed in the next section. 4. Standard Cell Characterization 1.5 1.5 1.25 1.25 data signals [V] data signals [V] 54 1 0.75 Vin 0.5 V 1 0.75 Vin 0.5 V out out ΔV/2 0.25 0 −100 ΔV/2 0.25 0 −50 0 50 time [ps] (a) rising edge 100 150 −100 −50 0 50 time [ps] 100 150 (b) falling edge Figure 4.10.: Signal characteristics of an inverter for different conditions in terms of different input slopes and output loads. 4.5.2. Current Profiles The current consumption of CMOS devices consists of a static and a dynamic component. The single cell simulations are done to analyze the dynamic behavior, but implicitly also include the static leakage current. Taking a closer look at the current profile for any transition shows different values of the current flow in the steady states before and after an event. Storing these values in the library, subsequently enables the determination of the static current consumption of a given design. The extraction of the dynamic part in terms of the current profiles is less trivial and has to be analyzed in more detail. As discussed in the previous sections, the current consumption significantly depends on the activity, but shows similar characteristics for the respective events. The plots in Figure 4.11 exemplarily show some current profiles of a flip-flop, which is triggered by rising edges with different slew rates at the clock input, and connected to different output loads. Indicated by the high peak values in conjunction with the varying curves at the end of the profiles, the output performs also a rising edge. The structure of flip-flops is much more complex than that of combinational gates, and consist therefore of several internal stages. This leads to current profiles with more diversified shapes than the single peaks of basic gates, which were discussed in the first part of this chapter. On the other hand, this allows a better distinction between the effects caused by different input slopes and output loads. Analog to the signal overshoots, mentioned in the section discussing the timing characteristics, also the current profiles show this effect in the very first part. This 4.5. Characteristic Parameter Extraction 55 0.3 0.25 IDD [mA] 0.2 0.15 0.1 0.05 0 −0.05 0 50 100 150 200 time [ps] 250 300 350 400 Figure 4.11.: Current profiles with marked characteristic values for a rising edge at the clock input of a flip-flop for different input slopes and output loads. simplifies the detection of the starting point, as a profile consequently features a zero point prior to the actual transition. The slope of the subsequent part in case of such a complex cell is directly related to the input signal characteristics. Similar to the overshoot amplitude, the following peak value depends on the slew rate. A particular property of complex cells is the central part, which is almost independent from the environmental conditions. This demonstrates, that the capacitive coupling effects are limited to single stages. The final part of the profile predominantly depends on the size and structure of the output load. The diversification of the falling part at the end is caused by different sizes of the second output load stage. The most significant characteristic values of current profiles are primarily the peak values and zero points. Due to the diversified last part of the profiles, the position, where the half of the last peak value is reached, is additionally appointed. Given that the final profile part is similar to an exponential function, the profile end is detected by an intersection point with an appropriate threshold value. Depending on the profile characteristics, all zero points and significant peak values are extracted and stored in the library. An interpolation of these values results in the definition of the actual profile characteristics for a given environmental condition. As a consequence of the similarity of the profile shapes, it is possible to align a reference profile to these points. Given that this method is not entirely accurate, the average current flow between the zero points is calculated and additionally stored in the library. Scaling the aligned profile to the correct amount of current efficiently compensates the last-mentioned imprecision. 56 4. Standard Cell Characterization 4.6. Macro Cell Characterization As discussed in the previous chapters, complex designs probably consist of macro cells. These components are usually memory instances, but possibly also function blocks in terms of hard macros, or even other types of reused sub-circuits. Macro cells may have numerous ports. A characterization approach similar to the procedure for standard cells is consequently unreasonable, as the number of simulations and also the amount of resulting data is inapplicable. In case of general sub-circuits, such as customary function blocks, a characterization can probably be done by applying a modeling method, such as the approaches introduced in the next Chapter 5. Including the timing characteristics and current profiles for reasonable conditions into the library, enables a consideration of such macro cells in a similar way as standard cells. On the other hand, memory instances show some important properties in terms of a very regular structure. This allows a characterization by simulating a certain number of significant transitions similar to the introduced procedure for standard cells. Figure 4.12 shows the current profile for a write operation of a memory in the upper plot, where a varying percentage of data outputs are caused to change their state. It can be observed that the contribution of the internal activity is significant, and the profile is marginally effected by different output activities. The second plot shows that the profile differences are, due to the regular structure, additionally proportional to the number of output transitions. As a result, it is consequently feasible to take the profile for the invariant part and to add an appropriately scaled output current waveform to determine the current consumption profile for any given number of output transitions. Further analysis have shown that this is also valid for read operations. Since all data inputs and outputs of a memory are typically buffered, only clock events may cause output transitions. Therefore, changes of the data input vector, as well as the behavior in consequence of an applied address, can generally be considered as independent from the output load. As a result, the characterization of these ports is sufficiently covered by simulations with different input signal characteristics, analog to the discussed procedure for flip-flop inputs. Moreover, the current profiles for data input events are again scalable, as the input buffers are typically a register of identical flip-flop instances. Additionally important are the features provided by the memory control unit. It typically provides a chip enable port, which can be used to disable most of the internal blocks and avoid unnecessary activities. As this is a kind of clock gating, all input and output buffers, but the address decoder as well, are possibly forced to ignore all incoming events. The current consumption, caused by the stimulation of the circuit parts that are directly connected to the inputs, is in this case significantly reduced to a minimum. Even if the current consumption of a disabled memory is minor, and therefore negligible in most cases, this behavior can be determined by 4.6. Macro Cell Characterization 57 a few simulation runs with appropriate input signal characteristics. As a result, also macro cells can be added to the library of characterized cells, provided that there is an applicable method to determine the respective timing characteristics and current profiles. In case of regularly structured cells, such as memory instances, it has been shown that a procedure similar to the principle for standard cells is applicable. current [mA] 150 100% 75% 50% 25% 0% 100 50 0 0 200 400 600 800 time [ps] 1000 1200 1400 (a) total current 100% 75% 50% 25% current [mA] 30 20 10 0 0 200 400 600 800 time [ps] 1000 1200 1400 (b) additional current compared to 0% activity Figure 4.12.: Current profile of a memory performing a write operation for a varying percentage of changing output data values. 5. Netlist Based Current Modeling The current consumption estimation of digital modules features the most benefit in very early design phases. Since placement and routing of the devices on a chip are one of the last tasks in the design flow, first analysis must be based on gate-level netlists. This form of circuit description is basically a list of the instantiated cells, including a specification of the interconnected ports. Given that especially complex systems are typically analyzed in various configurations and operating modes, the simulation effort to determine the most critical features, is an essential aspect. This chapter discusses approaches to efficiently model the current consumption profiles for digital systems. Since a traditional evaluation of stimuli vectors is not feasible for complex systems in terms of an excessive computational effort, an alternative method is introduced. Due to an isolated analysis of circuit parts with significantly different characteristics, an appropriate combination of a pattern-based approach and a random activity estimation is applied. The current profile calculation is based on the library of pre-characterized cells introduced in the previous Chapter 4. Gate-level netlists neither provide any information concerning the power distribution network, nor the placement and routing of the particular on-chip devices. The resulting current profiles consequently represent the ideal case in terms of a constant power supply voltage, as well as lossless cell interconnects. Approaches to estimate the parasitic effects of real power distribution networks and interconnect wires are discussed in Chapter 6. 5.1. Current Profile Calculation Traditional SPICE-based circuit simulation tools are based on low-level nodal approaches. These methods feature highly accurate results, but have essential disadvantages in the course of analyzing large circuits. If possible at all, the simulation of large modules demands excessive computation power and memory. The method, which is discussed in the following sections, is based on the library of pre-characterized cells, and provides the ability to calculate the current profiles sequentially for single switching events. Therefore, the environmental conditions of the particular cells are identified, and applied during the current profile determination. The profiles for the activity of entire circuits are finally composed by superposing the single event results with the appropriate time offset. 60 5. Netlist Based Current Modeling 5.1.1. Cell Environment Identification As discussed in Chapter 4, the transient behavior of a cell primarily depends on its input signal characteristics and the output load. Since these parameters are given by the configuration and interrelation of the instantiated cells, the characteristics of the ambient circuit properties have to be identified. Due to the library-based determination of the dynamic cell behavior, the important parameters are extracted from the circuit configuration. For the determination of the circuit environment, the static properties in terms of the driving transistor strengths and the equivalent inverter characteristics of the appropriate cell inputs are provided by the library of pre-characterized cells. Figure 5.1 exemplary shows the relevant circuit part for a given cell. Related to the data sets which are provided by the library, the important parameters are the strength of the driving cell, the fan-in of possible parallel cells, and the characteristics of the cells connected to the output. Under the assumption that port A of the shown cell is triggered, the driver strength in form of an equivalent inverter eIN Vdrv equals the output properties of drvA : eIN Vdrv = eIN Vout (drvA ) (5.1) Given that the input characteristics of the parallel cells parA1 and parA2 are available in form of equivalent inverter specifications for the appropriate port (eIN Vin ), they can be combined to an equivalent parallel inverter eIN Vpar by: eIN Vpar = X eIN Vin (k) (5.2) k With eIN Vdrv and eIN Vpar , the input signal slew rate can be estimated. As the actual slope additionally depends on the specification of the preceding cells, a kind of parameter tuning is necessary. Depending on the parameter sets provided by the library, eIN Vdrv and eIN Vpar are adapted to fit the input signal characteristics to the actually required rise/fall time. The approximated value is in this case the output signal timing of the driving cell, which is calculated in the course of the dynamic switching behavior determination, discussed in the next section. Finally, the output load is determined in a similar way as the parallel load by combining the connected cell input characteristics eIN Vin to an equivalent load inverter eIN Vload by eIN Vload = X eIN Vin (k). (5.3) k As the second output load stage is also relevant, this is determined analogously to the primary load by combining the input characteristics of the respective cells to an additional inverter. 5.1. Current Profile Calculation 61 parA1 A Z B drvA parA2 A Z A A Z B B Z C coi A A drvB A Z Z A Z B Z A B parB1 Z A A Z Z B B Figure 5.1.: Relevant circuit part for the environment parameter extraction for a given cell of interest (coi). 5.1.2. Single Event Characteristics Determination The library of pre-characterized cells provides a set of parameters, which describe the timing characteristics and current profiles for a given number of circuit configurations. Since the actual configurations differ from the characterized conditions, the stored values have to be interpolated. As the dynamic behavior of a cell depends on multiple parameters, an appropriate multi-dimensional interpolation is necessary. Applicable algorithms are discussed in [34]. The performed operations in the course of determining the current profiles for a given cell activity are demonstrated in Figure 5.2. The plots show for instance the behavior of an XOR gate with three inputs for an event, which causes a lowhigh transition at the output. The dots in the figure represent the characteristic parameters, which are provided by the library for different configurations. As the requested condition deviates from the characterized cases in this example, an appropriate interpolation of the available values is necessary. Due to the similarity of the profile shapes for a given event, a reference profile (dashed line) is aligned to the calculated vertices of the resulting profile (solid line). A tree-port XOR gate is relatively complex and therefore consists of internal nodes between the inputs and the output. As a result, this example enables to distinguish between the effects of the input signal characteristics, and the output load. It is shown that the first peak of the resulting profile is lower, and the second one is higher than the reference profile. This indicates that the requested condition has shown a relatively slow input signal transition in conjunction with a high output load. Therefore, the amplitude and the time instance of the first current peak has been determined by an interpo- 62 5. Netlist Based Current Modeling characteristic points reference profile result current [µA] 200 150 100 50 0 0 50 100 150 time [ps] 200 250 300 Figure 5.2.: Current profile calculation for a single event. The characteristic parameters in the library are interpolated, and a reference profile is aligned to the appropriate coordinates (time and amplitude). lation of the reference condition and the values for very slow input signals. On the other hand, the second peak has been determined by processing the reference condition and the configuration for high loads. The last characteristic value marks the time instance where the amplitude is the half of the last peak value. In addition to the mentioned coordinates, the zero points of the profile are determined. All profiles show at least one zero point at the beginning of a transition, but there are cells and events with a more significant negative amplitude than the shown profile (see Figure 5.3 discussed later). The first relevant time instance at the mentioned profile is the start of the input transition, but there are cells and events where the amplitude becomes negative for a short period prior to this point . After the determination of all these points, a reference profile is piece-wise aligned (scaled in time and amplitude) to cover the intended coordinates for the according condition. In addition to the current profiles, the timing parameters in terms of the propagation delay and the output signal slew rate are determined. This is a similar procedure, where the available timing parameters in the library are interpolated. These parameters are important for the current profile composition of entire modules, as well as for the determination of further characteristics, since the output signal is synonymous with the input signal of subsequent cells. 5.1.3. Current Profile Composition Given that the current profiles are calculated sequentially for each event, a composition to the profile for the entire circuit is necessary. Since the signal propagation 5.1. Current Profile Calculation 63 delays are available for all events, the single event profiles ik (t) can be superposed to the overall profile i(t) with the appropriate time offsets τk : X i(t) = ik (t − τk ) (5.4) k Figure 5.3 demonstrates this procedure for a simple circuit with one buffer connected to the input, followed by two paths with two buffers in series each. All instantiated cells have the driver strength one (B1), except buf4 has twice the strength (B2). Since buffer cells simply propagate the input signal to the output, and given that the circuit input is triggered by a rising edge, all cells perform a low-high transition at all ports. buf1 B1 buf2 buf3 B1 B1 buf4 buf5 B2 B1 cell name buf1 buf2 buf3 buf4 buf5 event propagation time[ps] delay[ps] 00.0 38.8 73.8 38.8 70.8 38.8 35.0 30.9 32.0 29.7 (a) circuit schematic and activity timing 0.5 single event entire circuit current [mA] 0.4 0.3 0.2 0.1 0 −0.1 0 20 40 60 80 100 time [ps] 120 140 160 180 200 (b) current profiles of the single events and the entire circuit Figure 5.3.: Composition of the current profile for a given activity for the entire circuit by superposing the single event profiles with the appropriate time offset. 64 5. Netlist Based Current Modeling The table in Figure 5.3(a) shows the time instances, when the input of each buffer is triggered, and the according signal propagation delays from the input to the output. Given that buf1 is directly driven by the circuit input signal (event at 0 ps), it represents the first part of the profile in Figure 5.3(b). Since buf2 and buf4 are connected to the output of buf1, both are triggered at the same time (38.8 ps), and the associated current profiles consequently start at this time instance. Given that buf4 has a stronger driver, its current peak value is higher, and the propagation delay (32 ps) is shorter than the 35 ps of buf2. As a result, buf5 is triggered some picoseconds earlier than buf3. This example also shows that the transient behavior, and consequently the current consumption, highly depend on the circuit configuration. As buf1 has a higher output load, the propagation delay is virtually 150% of the delay of buf5, and also the current profiles are considerably different, even if the cell types are similar. 5.2. Circuit Simulation Methods Characterizing the transient behavior of a digital system is generally done by analyzing the switching activities of the instantiated cells. The efficiency of the simulation methods are therefore given by the cell activity determination strategy. The most common method is based on the evaluation of given stimuli vectors, which are applied at the module inputs. As a result, the internal activity of the circuit is determined by processing the signal transitions, which are given by the stimuli pattern and the respective cell functionalities. Since the determination of feasible stimuli vectors is exceedingly difficult for complex designs, especially if they consist of storage elements like memories, an alternative approach is introduced. Given that the activity of programmable systems finally depends on the implemented software, the number of possible operating sequences is almost unmanageable. Therefore, an approach for a randomly distributed assignment of switching events is more feasible. The preferences of the pattern based method and the random activity interpretation approach are discussed in the following sections. 5.2.1. Pattern Based Simulation The application and evaluation of stimuli patterns at the module inputs prearranges the activity of a given circuit. As digital modules are basically organized in paths of consecutively switching cells, the transient events are propagated from one cell to another. The particular activity is determined by processing its input events with the according cell functionality. Consequently, resulting output transitions are propagated to thereon connected cells. As a result, the intended transitions are successively evaluated and propagated to the involved cells. Therefore, this method is also called an event-triggered simulation procedure. Since the signal timing char- 5.2. Circuit Simulation Methods 65 acteristics are also determined for the actual transitions, all required parameters are available for the library based current profile calculation. As repeatedly mentioned, this method is most accurate, but inapplicable for complex designs. 5.2.2. Random Activity Interpretation As the determination of the actual activity of complex modules and programmable systems is impractical in many cases, an approach for the interpretation of randomly assigned events is more feasible. Therefore, the parameter activity level is introduced. It defines the percentage of active cells per clock period. According to this parameter, an appropriate number of cells are randomly selected, and supposed to become active at a given point in time. Considering that a full random assignment of events may lead to unrealistic conditions, as possibly all selected cells are intended to become active at almost the same time, the activity distribution is necessarily regulated. Therefore, the logic depths of the cells within a circuit are determined. The depth of a cell basically represents the distance, in terms of the number of gates, between the input of the enclosing module and the respective cell. As the random activity assignment is done by selecting the intentionally active cells for each group according to the activity level, an appropriate distribution of the switching events over time is achieved. Since the current profile calculation, discussed in Section 5.1, is done separately for each transition, the randomly specified events can be directly processed. Given that this method provides a determination of the current consumption in a disordered manner, it can be performed unrelated to the paths of consecutively switching cells. Considering that the time instance of the events is important for the overall profile composition, an appropriate timing estimation is done in the course of the initial analysis of the circuit. As a result, the time instances, where the cell ports are supposed to become active, are determined, and the single event profiles for randomly estimated events can be composed to a profile for the entire circuit with the appropriate time offset. More detailed considerations for the logic depths determination, the random activity assignment, and the timing estimation, are discussed in the following sections. Logic Depths Assignment As mentioned before, the logic depth is an indicator for the number of transitions prior to a given cell becomes active. Figure 5.4 shows a part of the schematic view of a module with three inputs and a fraction of the internal circuit. As the module ports are considered as logic depth zero (LD=0), and given that the depth is incremented from the input to the output at each cell by one, the output of the shown cell1 is associated to LD=1. Consequently, the net at the output of cell2 represents LD=2. 66 5. Netlist Based Current Modeling Figure 5.4.: Assignment of the logic depths to the cell ports in a module. The connection properties of cell3 instantly show that the depth assignment is possibly ambivalent. While one input is associated to LD=1, the other one is directly connected to a module port (LD=0). Since the consideration of all possibilities is unmanageable for large circuits, a careful assignment policy is necessary for such ambivalent situations. The easiest method to handle such conditions, is to consistently assign the lower or higher depth, but this leads consequently to unrealistic results. Given that processing the lower depths leads to an exceptionally high number of cells per group, assigning the higher one results in a segmentation in a lot of depths with only a few cell references per group. Even though the logic depth is only a structural parameter, and provides no direct relation to the switching event timing, an appropriate determination is expedient. Otherwise, the previously mentioned approaches, concerning the consistent assignment of the higher or lower depth, may lead to an unrealistic concentration of events at a short time period. The reason for that is the weak relation to the structure. A high number of cells per group, or many groups consisting of only a few cell references, lead to a high degree of freedom at the selection of active cells. As a result, a selection of an inappropriately high number of cells with almost simultaneous activities is possible. An efficient approach to achieve a balance between the structural segmentation and the number of cells per depth is a randomly distributed assignment. Therefore, the logic depth of the output (LDout ) of a given cell with n inputs is determined by incrementing the depth of a randomly selected input (LDin ): LDout = rand(LDin {1, ..., n}) + 1 (5.5) Random Activity Determination Since this method is unrelated to applied stimuli patterns at the module ports, the internal activity of the circuit is intentionally assigned. As previously discussed, the cells in a module are associated to groups of cells for each logic depth. Under 5.2. Circuit Simulation Methods 67 Figure 5.5.: Illustration of the random activity distribution method. Intentionally active cells are selected according to the given activity level of 25% for each depth. The outputs of these cells are supposed to perform a low-high or high-low transition, which is consequently distributed to the inputs of the connected cells. the assumption that the probability of a triggering event is equal for all cells in one group, the selection of potentially active cells is randomly distributed. Given that the activity of a module nevertheless depends on the functionality of the circuit and on the events at the ports of the instantiated cells, the parameter activity level is introduced. It defines the percentage of active cells for a given circuit. Depending on the application as well as the type of a given module, the actual activity of most circuits is usually within the range of 15%–35%. Exceptional cases are typically processor cores with considerably higher, and peripheral units with possibly lower activity levels. Figure 5.5 shows the internal structure of a module and the groups of associated cells for each logic depth. It illustrates the random selection of the potentially active cells for a given activity level of 25%. As the switching events are preferably distributed over the entire module, the selection is separately done for each group. This leads to one active element per depth for the shown example. Generally, the selected cell references (R) consist of subsets (r) of all cells (C), where N references are extracted for a given depth: R = {r|r ⊆ C, |r| = N } (5.6) 68 5. Netlist Based Current Modeling As N is probably different for each depth, it is calculated by N = A · NC (LD), where A is the activity level, and NC (LD) is the number of associated cells at the according depth. As the active cells are randomly distributed, there are no paths of consecutively switching cells, and the appropriate events have to be randomly determined as well. Therefore, the outputs of the supposed active cells are intended to perform a rising or falling edge. These events are consequently propagated to the inputs of thereon connected cells. As for instance the shown cell 2c is supposed to be active, the output event is assigned to the appropriate input of the cells 2b, 3b, and 3c. The current profile calculation is finally done for the cells which are triggered by the propagated events at the appropriate input. Since the initial state of these cells is also unknown, this parameter is assigned randomly as well. The state (S) is therefore determined for the given activity A and the active port (P) by: S = S(rand({1, ..., m}, A, P ) (5.7) Timing Parameter Estimation As a consequence of the randomly distributed activity, the event timings cannot be calculated for consecutive events. Therefore, the time instances, when a cell possibly becomes active, have to be estimated. Given that the timing characteristics of a cell depend on the activity, but the actual events are unknown, the propagation delays (tpd ) and signal rise/fall times (tr and tf ) are determined for randomly estimated transitions. As the signal slew rates are typically different for rising and falling edges, both transitions have to be considered. Due to the dependence of the results on the triggered port P , as well as on the steady state prior to the transition S, the timing parameters are determined by a two-dimensional random selection: tr,f,pd = tr,f,pd [rand(P {1, ..., n}, S{1, ..., m})] (5.8) This procedure is done similar to the pattern based simulation, starting at the module inputs along the paths of consecutive cells. Therefore, all effecting parameters, in terms of the circuit environment and the output signal characteristics of previous cells, are considered. The time instances, when a port of a given cell supposedly becomes active, is calculated by an accumulation of the appropriate signal propagation delays. As a result, all important timing characteristics are available for the discussed current profile calculation for randomly assigned activities. 5.3. Modeling of Complex Modules As repeatedly mentioned, the stimuli determination for a pattern based simulation is an expensive procedure for complex modules. On the other hand, a fully random 5.3. Modeling of Complex Modules 69 activity estimation provides many degrees of freedom and may lead to unrealistic results under certain conditions. As discussed in Chapter 3, a synchronous sequential design consists basically of a clock subsystem and a combinational logic part. Given that the characteristics of these two parts are significantly different, a differentiated handling of them is expedient. Since the clock signal characteristics are usually known, but the combinational logic activity has to be estimated, an appropriate combination of the pattern based simulation method and the random activity approach is applied. This composite approach enables a detailed analysis of the clock subsystem and an efficient characterization of the combinational logic behavior in terms of a significant reduction of the simulation effort, compared to the evaluation of multiple stimuli vectors. As illustrated in Figure 5.6, the simulation flow consists of two branches in this case, one for the clock paths and another one for the combinational logic. In both branches the current profile calculation is based on the library of pre-characterized cells, but the switching activity is determined by a pattern evaluation for the clock paths, and a random activity estimation for the combinational logic. clock distribution network module netlist combinational logic clock pattern evaluation standard cell characterization random activity estimation current [A] current profile calculation current profile calculation cell library 0.4 0.2 0 0 0.5 1 1.5 2 time [ns] 2.5 3 3.5 4 Figure 5.6.: Simulation flow for large modules. Partitioning the module enables the application of a pattern based simulation for the clock subsystem, and the random activity estimation approach for the combinational logic. The current profile calculation is based on the library of precharacterized cells in both branches. 70 5. Netlist Based Current Modeling 5.3.1. Module Partitioning As the ports of a module provide different functionalities, the internal paths of consecutively switching cells, starting at the module inputs, are used to distribute the appropriate signals to the particular functional units. Partitioning a module means in this case a categorization of the path types, depending on the expected input signal characteristics, as well as the mode of operation of the connected functional unit. This provides the opportunity to employ efficient simulation methods for the respective module parts. Therefore, following path categories are detected: system clock: At least one input of synchronous sequential circuits is intended to be connected to the system clock. This signal is typically distributed to the storage elements by a clock tree. Therefore, these paths generally terminate at the clock input of flip-flops and memories, or at the enable port of latches. data input: Most input ports of a module are typically used to apply data values. These ports are usually connected directly to the according functional unit. set/reset: The set and reset signals are distributed from the respective module ports to the clocked storage elements to bring the system into a specified state. These signals are typically asynchronous to the system clock and only active in exceptional cases, such as the system power-on procedure. scan test: Special types of flip-flops have additional inputs providing the feature to form scan test chains. Given that especially complex designs typically provide the opportunity to form several chains, there exist also a couple of scan enable and scan input ports. Analog to the introduced standard cell characterization in Chapter 4, set/reset and scan test paths are supposed to be inactive during the normal operating mode of a system. Therefore, these ports are initially set to an appropriate constant value to disable the according features. Since constant signals cannot trigger any switching activity, these paths are disregarded during the calculation of the dynamic current consumption. For this reason, set/reset and scan test paths are not referred anymore in the following sections. As a result, only the clock distribution network, the remaining possibly active combinational logic parts, and the storage elements have to be analyzed for characterizing the dynamic current consumption of synchronous sequential system modules. Figure 5.7 gives an overview of the finally considered module partitions. The basic structure of the shown picture is similar to the introduced state machines in Chapter 3. The typically largest partition is the combinational logic, stimulated by the data inputs and the storage element outputs. Its results are basically the input values of the storage elements and the module outputs, but some of these signals are also used to configure the clock tree. Due to the special mode of operation of 5.3. Modeling of Complex Modules 71 Figure 5.7.: Illustration of the module partitioning procedure. It is differentiated between the combinational logic part, the clock tree, and the clocked storage elements. the clock tree, in terms of the usually well-known clock signal characteristics and the effective clock gating, it is important to consider this partition separately. Special care has to be taken on the clocked storage elements. Even though they are considered as a separate partition, the behavior of these cells highly depends on the clock tree outputs, but also on the characteristics of the combinational logic results. The clock inputs of the storage elements are consequently associated to the clock subsystem, and the characterization of the data input and output behavior analysis is related to the combinational logic activity. 5.3.2. Clock Subsystem As a result of the module partitioning, the clock subsystem is isolated from the rest of the circuit. The primary intention of the clock tree is the distribution of the clock signal to the storage elements, and the system clock characteristics are usually known. In this case, a pattern based simulation method is applicable for this part of a circuit. The current profile calculation is consequently done by evaluating the actual cell activities according to the clock signal specifications and the particular cell functions. Clock trees usually consist of buffer cells, but may also be implemented using inverters. Therefore, and given that a part of the clock tree is possibly inverted, an evaluation of the individual cell functions is expedient. 72 5. Netlist Based Current Modeling Since clock gating is a very popular and effective measure to reduce unnecessary switching activities, it is essential to consider it in the course of modeling the current consumption of a system. In extensively optimized systems, the clock signal distribution to virtually all storage elements is possibly controlled by gating cells. Hence, the percentage of actually active clock subsystem cells is theoretically in the range of 0% to 100%. The enable signals for the instantiated gating cells are predominantly controlled internally by the combinational part of a module. As the activity of the combinational part is randomly estimated, the actually ”open” gates have to be randomly evaluated as well. Therefore, the parameter gating level is introduced. It defines the percentage of disabled storage elements. According to this parameter, the control inputs of randomly selected gating cells are appropriately set. This prevents the distribution of the clock signal to the intentionally affected part of the clock subsystem. As a result, the storage elements in the subtree of deactivated gating cells are not triggered by the clock signal and consequently cause no dynamic current flow. Given that clock gating is potentially implemented hierarchically, the number of active gating cells cannot be assumed as proportional to the number of active storage elements. Therefore, gating cells have appropriate weights, which are given by the according number of connected storage elements. The determination of the active gates is therefore an iterative procedure, where the gates are randomly selected, until the given gating level is achieved. As the gating level is inverse proportional to the number of triggered storage elements, it is also termed clock activity level in the following sections. The plots in Figure 5.8 show the current profiles for the clock subsystem of a microcontroller core module. The analyzed module consists of more than 150.000 cells, where approximately 14.000 of them are flip-flops, 700 clock buffers, and 100 gating cells. The simulated clock activity levels are 33%, 66%, and 100%, where the simulation has been repeated several times for each of these levels. As the initialization of the gating cells, and therefore the determination of deactivated subtrees has been done independently for each simulation run, the set of curves for each activity level shows minor variations. To demonstrate the effects of gating on the respective parts of the clock subsystem, the current profiles for the entire module are plotted in Figure 5.8(a), and the partial results for the clock tree and the storage elements in (b) and (c). It is shown that lower activity levels significantly reduce the current consumption of the flip-flops, but the amplitudes of the clock tree profiles are proportionally reduced as well. Clock trees are typically dimensioned to ensure a preferably similar clock signal arrival time at the storage elements. Therefore, the number of buffers at each subtree, as well as the respective driver strengths, are almost equivalent. As a result, Figure 5.8(b) shows that the variances of the results for the same activity level are quite small, even though the gating cells have been re-initialized for each simulation run. Each profile consequently shows the result for randomly selected clock paths. 5.3. Modeling of Complex Modules 73 2.5 100% 66% 33% current [A] 2 1.5 1 0.5 0 0 50 100 150 200 250 300 time [ps] 350 400 450 500 (a) entire clock subsystem 2 2 100% 66% 33% 1.5 current [A] current [A] 1.5 100% 66% 33% 1 0.5 0 1 0.5 0 0 100 200 time [ps] (b) clock tree 300 400 100 200 300 time [ps] 400 500 (c) storage elements Figure 5.8.: Current profiles for the clock subsystem of a module with approximately 14.000 flip-flops, 700 clock buffers, and 100 gating cells. The plots show the results for different clock activity levels, where the simulations have been repeated several times for each configuration. 74 5. Netlist Based Current Modeling 1.4 100% 66% 33% 0% 1.2 current [A] 1 0.8 0.6 0.4 0.2 0 −0.2 0 50 100 150 200 time [ps] 250 300 350 Figure 5.9.: Current profiles for a given number of clocked storage elements with different percentages of elements performing an output transition. The profiles in Figure 5.8(c) additionally demonstrate the reason for the variations at the clock subsystem. Due to the analysis of randomly selected clock paths, different types of storage elements are triggered at any simulation run. It can be seen that the module apparently consists of flip-flops with different driver strengths, as the variation of the clock tree profiles for each parameter set is insignificant, while the storage element results show observable differences. The effect of such variations on the quality of the resulting current consumption model for an entire system is discussed later. 5.3.3. Clocked Storage Elements The analysis of the clocked storage element behavior is related to the characteristics of the other module partitions. As the clock inputs are directly connected to the clock tree outputs, this part of the storage elements is associated to the clock subsystem and already considered (see Section 5.3.2). The storage element data outputs trigger the combinational logic cells, and the number of switching outputs needs to be derived from the designated percentage of active logic gates. As it cannot be assumed that clock gating is implemented in such a way that all storage elements with inactive outputs are disabled, only a part of the triggered elements possibly change their output state. Therefore, an activity level, defining the percentage of elements performing an output transition in consequence of a triggering clock event, is determined. The value of this parameter is given by the relation of the gating level and the activity of the combinational logic cells. 5.3. Modeling of Complex Modules 75 Figure 5.9 shows the current profiles for a given number of storage elements with different output activity levels. The plotted profiles are the simulation results for: all outputs active (100%), 66%, 33%, and the case where no element changes its output state (0%) as a reference. It is shown that switching outputs cause a significantly higher current consumption compared to the result for stable data values. The simulations have been repeated several times for each configuration, where the supposed active cells have been randomly selected. It can be shown that this approach is feasible, as the variance of the results are insignificant. The additional current consumption, caused by storage element output transitions, compared to the profiles for stable outputs, is plotted in Figure 5.10. The resulting average current consumption values for the analyzed circuit are listed in Table 5.1. It can be observed that clocked storage elements with output transitions cause a current consumption that is possibly more than three times the value of inactive ones. Therefore, it is important to accurately determine the percentage of active storage elements. 1.2 100% 66% 33% 0% 1 current [A] 0.8 0.6 0.4 0.2 0 −0.2 0 50 100 150 200 time [ps] 250 300 350 Figure 5.10.: Current consumption caused by output transitions with different activity levels, compared to the profile for triggered clock inputs only. Table 5.1.: Average current consumption distribution at clocked storage elements. output activity clock trigger output transition overall current 0% 129.3 mA 0.0 mA 129.3 mA 33% 129.3 mA 89.1 mA 218.4 mA 66% 129.3 mA 178.3 mA 307.6 mA 100% 129.3 mA 269.0 mA 398.3 mA 76 5. Netlist Based Current Modeling 3 30% 20% 10% 2.5 current [A] 2 1.5 1 0.5 0 −0.5 0 200 400 600 800 time [ps] 1000 1200 1400 1600 Figure 5.11.: Current profiles for the combinational logic part of a given module. Simulation results are based on the random activity interpretation method and shown for different activity levels. 5.3.4. Combinational Logic The largest partition of a module is typically the combinational logic block. Due to the size and complexity of this circuit part, the random activity interpretation, introduced in Section 5.2.2, is a feasible method to determine the respective current profiles. Given that the clock subsystem is modeled by evaluating the actual clock signal, the timing characteristics of all clocked storage element outputs are already calculated. As a result, the starting points in time for each root node of the internal logic paths are available. The determination of the time instance, when a particular combinational cell possibly becomes active, is consequently based on reliable values. In addition to the clocked storage element outputs, also the data input ports of a module have to be considered as root nodes of combinational logic paths. As these ports are possibly stimulated by an external circuit, but the activity timing cannot be determined on the basis of the analyzed module netlist only, it has to be estimated or provided as a parameter. Figure 5.11 shows the resulting current profiles for the combinational part of the mentioned microcontroller core unit. Based on the random activity interpretation method, the simulation has been repeated several times for the activity levels 10%, 20%, and 30%. It is shown that, even though the supposed active cells as well as their particular activity is randomly determined, the variation of the results for the respective parameter set is insignificant. The characteristics of the shown current profiles are typical for this kind of circuit. As a consequence of the virtually simultaneous switching activity of the clocked storage elements, the first part of the profiles show a relatively high peak with a considerably short rising time. The 5.3. Modeling of Complex Modules 77 2.5 clock distribution storage elements combinational logic current [A] 2 1.5 1 0.5 0 0 500 1000 1500 2000 2500 time [ps] Figure 5.12.: Current profiles for the clock distribution network, the clocked storage elements, and the combinational logic part of a given module. combinational logic cells, which are directly connected to flip-flips or latches, are in this case stimulated at a similar time, and adding the respective current profiles of this considerably high number of logic cells leads to this peak. The remaining profile parts show a continuously decreasing current consumption, since the results of the logic operations are gradually available. At approximately 1.2 ns appears another observably increased current flow, as the analyzed circuit consists of a memory block with a delay of approximately 1 ns. The availability of the memory output values triggers some additional logic activity in this case. The very last profile part shows the current consumption caused by the paths with the longest cell delays and/or the largest number of consecutively switching cells, also referred to as the critical paths. 5.3.5. Profile Composition As the current profiles for the different partitions of a module are determined separately, these partial results have to be combined to the results for an entire module. Figure 5.12 shows these partial profiles for the analyzed module, which has been simulated with the following parameters: the clock gating cells are configured such that 40% of the storage element outputs are enabled, where 50% of them actually change their output state. Due to the discussed dependencies, this leads to a combinational logic activity level of 20%. The plotted profiles demonstrate the contribution of the consecutively triggered circuit parts during one clock period, starting with the rising clock edge, followed by the falling edge at approximately 2 ns in this example. 78 5. Netlist Based Current Modeling Given that all clocked storage elements are high-active in this example, i.e. sensitive on the rising edge, the combinational logic cells are only active during the high-phase, while the clock tree and the storage element inputs are triggered by both clock edges. The amplitudes of the partial profiles also show the relation of the number of active cells per partition. As one partition output typically drives several cell inputs, the partial profiles start typically with a peak that is considerably higher than the according profile of the driving partition. The final current consumption profiles for the analyzed module given by the sum of the partial profiles are plotted in Figure 5.13. This method is applicable to determine the typical or average characteristics of a circuit, but also to analyze the profile variations as a result of the random stimulation of possibly different active circuit parts. To demonstrate these variations, the results of several simulation runs with identical parameters are shown. 2.5 entire module current [A] 2 1.5 1 0.5 0 0 500 1000 1500 2000 2500 time [ps] Figure 5.13.: Resulting current profiles for the analyzed module. For several reasons the frequency spectrum is often more important than the time domain waveform. Therefore, the results of a Discrete Fourier Transform (DFT) are discussed and shown in the following paragraphs. For an introduction of the Fourier transform method, as well as the most important relations of time domain signal characteristics and the respective frequency spectrum waveforms, see Appendix A. It has been shown that the profile determination is based on random approaches, but the resulting waveforms do not vary significantly for a given activity level. As the profiles are additionally supposed to be periodic, the discrete Fourier transform is actually used to approximate the Fourier series. The following figures therefore show the magnitudes of the complex Fourier coefficients and not the power-density spectra as usually done for random signals. 5.3. Modeling of Complex Modules 79 As the complex Fourier series is defined as xp (t) = ∞ X ck ej2πkf0 t , (5.9) k=−∞ and given that exactly one period of the sampled time-domain signal is transformed using the DFT, the complex Fourier coefficients are approximated by 1 X[k]. N ck ≈ (5.10) where X[k] is the DFT-spectrum, and N denotes the number of samples per period. Since the discussed profiles are real signals, a Fourier decomposition in form of x(t) = A0 + ∞ X Ak cos(2πkf0 t + αk ) (5.11) k=1 is also possible, and the DC value A0 , as well as the amplitudes Ak and the phases αk of the harmonics can be approximated as follows: A0 ≈ 1 X[0], N Ak ≈ 2 |X[k]|, N αk ≈ ∠X[k]. (5.12) Figure 5.14 consequently shows the magnitude spectrum in form of the discrete Fourier coefficients of the clocked partition (clock tree and storage elements) profiles, and in the second plot, the respective combinational logic partition results. The previously discussed profiles can be roughly approximated with triangle-shaped waveforms. As such pulses coincide with squared Sinc functions in the frequency domain, the basic shape of the FFT results also show these characteristics. Compared to the frequency domain results of the combinational logic partition, where the squared Sinc function is relatively obvious, the characteristic of the transformed clocked partition profile is more complex. The respective time domain profiles actually consist of two pulses with different pulse widths and peak values. As mentioned before, the most significant difference between the current consumption caused by the different clock events, is the additional storage element output activity at the high phase of the clock signal. Hence, the pulses can be substituted by identical profiles for both clock edges, and an additional pulse for the difference between the rising and falling edge profiles. This results in one signal consisting of two identical subsequent pulses, and another one with the original clock period. As the time domain signals are assumed to be periodic anyway, the waveform with the two pulses can be also considered as one pulse with half of the clock period. Especially the coefficients for the clocked partition profiles in Figure 5.14 visualize this characteristic at lower frequencies. It shows a transformed periodic signal with the base frequency of 250 MHz, where the magnitudes of the harmonics of 500 MHz 80 5. Netlist Based Current Modeling 100 clocked partitions current [mA] 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 frequency [GHz] 3.5 4 4.5 5 200 combinational logic current [mA] 150 100 50 0 0 0.5 1 1.5 2 2.5 3 frequency [GHz] 3.5 4 4.5 5 Figure 5.14.: Magnitudes of the complex Fourier coefficients of the current profiles for the clocked partitions and the combinational logic. 5.3. Modeling of Complex Modules 81 250 entire module current [mA] 200 150 100 50 0 0 0.5 1 1.5 2 2.5 3 frequency [GHz] 3.5 4 4.5 5 Figure 5.15.: Fourier coefficient magnitudes of the current profiles for the entire module. are significantly higher than at the odd multiples of the base frequency. Due to the actually different cell timing characteristics for rising and falling edges, the profile peaks are not exactly equidistant. For this reason, the values at higher frequencies show a different characteristic. The Fourier coefficients of the current profiles for the entire module is shown in Figure 5.15. It basically represents the sum of the previously shown partial results. Here only the magnitudes are plotted. As the results of several simulation runs are plotted, it can be shown that the variation of the magnitudes for each frequency is relatively marginal. This leads to the result that the current profile characteristics depend only secondarily on which particular cells are actually triggered. Primarily important are the structure of the module and the respective activity levels, as well as the clock signal characteristics. 5.3.6. Multiple Clock Domains Complex designs usually consist of multiple functional units, which possibly operate at different clock frequencies. While the operating time base of core modules is typically the fastest one, peripheral functionalities are triggered at a fraction of the system clock rate in many cases. As discussed in Chapter 3 one of the most important parameters in the aim of determining the current profile for a module with multiple clock domains is the duty cycle ratio. Given that different methods are possibly used to derive a slower time base from the fast system clock signal, the respective characteristics of the 5. Netlist Based Current Modeling signal 82 1 200 MHz 50% duty 2 100 MHz 25% duty 3 100 MHz 50% duty 4 100 MHz 75% duty 0 1 2 3 4 5 time [ns] 6 7 8 9 10 2.5 25% 50% 75% current [A] 2 1.5 1 0.5 0 0 1 2 3 4 5 time [ns] 6 7 8 9 10 Figure 5.16.: System clock signal and potential results of a clock divider by factor two, and the effects on the current profiles. resulting signals are important. Figure 5.16 shows a common clock signal and three signals with half of this frequency, but different duty cycle ratios (25, 50, and 75%). These signals are potential results of a clock division by factor two, where the resulting ratios depend on the actual divider type. The easiest method, and therefore most commonly implemented, is a divider which simply disables every second clock period, resulting in a signal with a duty cycle ratio of 25% (plotted signal 2). Other methods may result in signals with equal lengths of the high and low phases (signal 3), or extended high phases to 75% of the resulting clock period (signal 4). The second plot in Figure 5.16 shows the effects of these different ratios on the current profiles. As the system activities are aligned to the rising clock edge, and all instantiated storage elements are sensitive on this event, the main current peak is at the same time in all cases. On the other hand, the timing of the activity 5.3. Modeling of Complex Modules 83 100 25% 50% 75% current [mA] 80 60 40 20 0 0 0.5 1 1.5 frequency [GHz] 2 2.5 3 Figure 5.17.: Envelopes of the Fourier coefficient magnitudes of a current profile with different duty cycle rations. triggered by the falling clock edge depends on the divider type and the system clock characteristics. Given that the system clock frequency is 200 MHz in this example, these current peaks appear at approximately 2.5, 5, and 7.5 ns, depending on the respective duty cycle ratio. These differently located current peaks in the time domain also have significant effects on the magnitudes of the respective Fourier coefficients (see Figure 5.17). While the basic shape of the results is virtually similar, the relation of the particular values considerably depends on the actual clock characteristics. For a system consisting of a core module and a peripheral unit, which is triggered by half of the given system clock frequency of 200 MHz, the profile composition and the respective Fourier coefficients are shown in Figure 5.18. The core module in this example is the exemplified circuit for the partial profile determination in the previous sections. The additional unit is about twice this size, consisting of approximately 300.000 cell instances, but is considerably less active than the core module. It is shown that the peripheral unit is only active at the first system clock period (25% duty cycle ratio), while the core module is triggered by all clock events. As the period of the profile for the entire system is given by the least common multiple of all clock periods, the frequency domain results in Figure 5.18(b) show discrete values at each multiple of 100 MHz. The frequency domain result for the entire system shows significantly alternating values at the even and odd multiples of the base frequency. Due to the double clock frequency of the core module, it only contributes to the total spectrum at multiples of 200 MHz. Consequently, every second value of the system result is equal to the 84 5. Netlist Based Current Modeling peripheral result, while otherwise both modules contribute a regular value to the coefficients of the current profiles for the entire system. When analyzing extensively large and complex modules with multiple different functional units, a clock divider may be embedded inside a module. In this case, parts of this module are triggered by clock signals with different characteristics. Given that gate-level netlists are in most cases flattened, i.e. no information concerning subcircuit hierarchies or a differentiation of the functional units is available, an actual assignment of particular cells to a clock domain is not feasible. As mentioned in the sections discussing the profile determination for the different module partitions (Sections 5.3.2–5.3.4), the current profile amplitudes are almost proportional to the activity level parameter. Hence, and under the assumption that the percentage of cells associated to each clock domain is available, a proportionated distribution of the according current profile is applicable. Such a distributed profile for multiple clocks (imc ) can be accordingly composed by: X imc = Ak ik (t) (5.13) k where Ak with Ak < 1 denotes the relative contribution of associated cells to the particular clock domain. As previously discussed and shown in the respective figures, the module profiles (ik ) are given by the superposition of the profiles for the rising and falling clock edges (ir and if ) at the appropriate time instances (τr and τf ): i(t) = ir (t − τr ) + if (t − τf ) (5.14) 5.3. Modeling of Complex Modules 85 6 core module peripheral unit system result current [A] 5 4 3 2 1 0 0 1 2 3 4 5 time [ns] 6 7 8 9 10 300 core module peripheral unit system result current [mA] 250 200 150 100 50 0 0 0.5 1 1.5 frequency [GHz] 2 2.5 3 Figure 5.18.: Current profiles (a) and Fourier coefficient magnitudes (b) of a system consisting of two modules stimulated by different clock frequency domains. 6. Parasitic Effected Current Modeling The current profile determination discussed in the previous sections has been done for ideal conditions in terms of lossless cell interconnections and a constant power supply voltage. As introduced in Chapter 2, real power distribution networks and cell interconnect wires cause a significant effect on the system behavior. Given that wire loads in deep submicron technologies are almost in the range of the cell input loads, it is essentially important to consider these effects. One effect of real interconnects is a considerably higher total amount of current consumption due to the additionally demanded current for charging and discharging the wire capacitances. These additional loads at the driving cell outputs also lead to reduced signal slew rates, and consequently to higher signal propagation delays. Power distribution networks affect the system performance in a different manner, but the effects are similar to that of interconnect wires. Real power supply systems can be modeled as a network of resistances, capacitances, and optional inductances. A current flow through such a network causes voltage drops at the supply pins of all on-chip devices. As a consequence, the particular cells are temporarily operating at a lower supply voltage, which additionally affects the operating performance. Since gate-level netlists provide no information concerning the cell interconnect wire characteristics and the power supply network, the mentioned effects have to be estimated. Approaches for modeling these effects by post-processing the ideal current profiles are discussed in the following sections. 6.1. General Principles As mentioned before, interconnect wires and power supply systems cause significant effects on the system performance and consequently lead to modified current consumption profile characteristics. While interconnects introduce additional loads to driving cells, power distribution networks cause voltage drops. But both cause a reduction of the switching performance of the respective components. The effects on the current consumption profile is therefore similar. Figure 6.1 therefore shows the simulation results for an example circuit with lossless interconnects, compared to the analysis including an appropriate wire load model. Both waveforms in each plot are SPICE simulation results for the switching 88 6. Parasitic Effected Current Modeling activity that is triggered by a rising clock edge. The ideal waveform is without any wire model, and the other one includes the extracted parasitic resistances and capacitances from the according layout data. signal voltage [V] 1.5 ideal extracted 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 time [ns] 0.6 0.7 0.8 0.9 1 (a) data signal transitions ideal extracted current [mA] 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 time [ns] 0.6 0.7 0.8 0.9 1 (b) current profiles Figure 6.1.: Simulation results showing the effects of the parasitic elements of interconnect wires on data signals (a) and current profiles (b). Figure 6.1(a) shows that the signal slew rates are significantly reduced due to the additional loads at the cell output drivers. Since this causes also higher signal propagation delays, the activity of the particular cells is accordingly delayed. As a result, this reduced performance of the switching on-chip devices leads to a considerably stretched current profile, plotted in Figure 6.1(b). 6.1. General Principles 89 It can be also observed, that even though the mean current is increased, the maximum peak values are reduced due to the distribution of the partial profiles along the time axis. Given that the analyzed module is relatively small, an additional effect is demonstrated by the visible rounded peaks. As the resistances and capacitances of the applied wire models form a kind of low-pass filter, actually rapid switching activities are consequently restricted to moderate signal transitions. The resulting profiles therefore show considerably less striking current peaks. The Fourier transform results of the mentioned profiles are shown in Figure 6.2. As the system clock period is 4 ns in this example, the fundamental frequency is 250 MHz. The plots show at a first glance major differences between the ideal and the extracted coefficient envelope. But even though the amplitudes and the positions of the local maxima and minima along the frequency axis show significant differences, the basic characteristics are nevertheless similar. 0.35 ideal extracted 0.3 current [mA] 0.25 0.2 0.15 0.1 0.05 0 0 1000 2000 3000 4000 frequency [MHz] 5000 6000 7000 Figure 6.2.: Envelope of the Fourier coefficients of the current profiles for the ideal circuit, compared to the result including the parasitic elements of a wire load model. Note also that the DC value which corresponds to the mean current consumption, is higher for the profiles containing the parasitic effects. This can be simply explained by the additional current consumption of the parasitic elements, such as for charging/discharging the wire capacitances. It can be seen that both coefficient envelope waveforms have similar local maxima and minima, but at different frequencies. Note that stretching a signal in the timedomain results in a compression of the frequency spectrum. Figure 6.3 therefore compares the simulation result for the circuit including the extracted parasitic elements with a stretched (in time domain) and scaled initially ideal profile. It 90 6. Parasitic Effected Current Modeling can be seen that the two spectra coincide up to 2 GHz. Today’s integrated circuit emission models are usually demanded to be valid up to at least 1 GHz. Therefore this simple post-processing method of stretching the current profile in time and scaling the amplitude could be a good approximation to account for the effects caused by cell interconnect wires and power distribution networks. 0.35 extracted processed 0.3 current [mA] 0.25 0.2 0.15 0.1 0.05 0 0 1000 2000 3000 4000 frequency [MHz] 5000 6000 7000 Figure 6.3.: Comparison of the Fourier coefficient envelopes of the simulation result including the parasitic effects and an initially processed ideal profile. It can also be seen that the post-processed values beyond 2 GHz are higher than the extracted simulation result. These differences could be caused by the attenuating behavior of the RC-structures, which is effective at higher frequencies, similar to a low-pass filter. The variations at even higher frequencies above 5 GHz could, amongst others, be caused by the probably irregular wire structures, while the applied post-processing supposes a constant effect on the switching behavior of all cells. Even though this method is insufficient to cover such effects at high frequencies, it still provides a good approximation of the expected behavior with a feasible accuracy within a reasonable frequency range. 6.2. Cell Interconnects As discussed in Chapter 2, and mentioned in the previous section, real interconnect wires cause lower signal slew rates and consequently higher propagation delays, but also an additional amount of current flow. Figure 6.4 shows the current profiles for the rising and falling clock edge of a significantly larger circuit than in the previous section. The plots show a comparison of the simulation results for ideal interconnects and an equivalent circuit including models representing the real wire 6.2. Cell Interconnects 91 properties. The respective parasitic elements are extracted from the according circuit layout data. It can be seen that the storage elements are sensitive on the rising edge, as there is also some combinational activity, while the falling edge triggers only the clocked elements, and therefore considerably less current is consumed. 160 160 ideal extracted 120 120 100 100 80 60 80 60 40 40 20 20 0 0 0 100 200 time [ps] (a) rising edge 300 400 ideal extracted 140 current [mA] current [mA] 140 0 100 200 time [ps] 300 400 (b) falling edge Figure 6.4.: Current profiles for a rising (a) and falling (b) clock edge for a given circuit. The plots show the profiles for ideal interconnects, compared to the simulation results including wire models. The previously discussed approaches have been applied to approximate the parasitic effected profiles for this circuit example. The additional cell delays are therefore modeled by a profile stretching in time, and the demanded capacitance charge/discharge currents are considered by an appropriate amplitude scaling. A separate application of this procedure to the partial profiles for both clock edges results in the Fourier coefficient envelopes shown in Figure 6.5. As expected, both plots show significant differences between the ideal and the extracted circuit. But the post-processing of the ideal profiles results in a good approximation of the extracted circuit waveforms, even if there are some uncertainties in the results for the rising edge. This could be primarily given by the fact that the current profile characteristics in Figure 6.4(a) show some differences in the first profile part. Such differences in the waveforms are probably caused by nets with significantly surpassing wire loads, leading to an exceptionally affected performance of the respective driving cells. Hence, the activity of such function blocks is more delayed, and consequently effects the resulting profile shape. On the other hand, the clock subsystem is typically well-balanced. As a result, the profiles for the ideal and the extracted 92 6. Parasitic Effected Current Modeling circuit are almost similar. The uncertainties of the post-processed profiles are in this case minor (see Figure 6.5(b)). 10 ideal extracted processed current [mA] 8 6 4 2 0 0 500 1000 1500 frequency [MHz] 2000 2500 (a) rising edge 4 ideal extracted processed current [mA] 3 2 1 0 0 500 1000 1500 frequency [MHz] 2000 2500 (b) falling edge Figure 6.5.: Post-processing result compared to the coefficients of the ideal and extracted circuit for the rising (a) and the falling (b) clock edge. The final result for the complete period with both clock edges is plotted in Figure 6.6. As the basic components of the profile are two current peaks in a first approximation, the plots show the characteristic magnitude alternation. It can be observed that the processed values with the higher amplitudes almost match the simulation results up to several GHz. The sequence of lower magnitudes shows on the other hand considerable differences. One reason for that could be the instanti- 6.3. Power Distribution Networks 93 ideal extracted processed 12 current [mA] 10 8 6 4 2 0 0 500 1000 1500 2000 2500 3000 frequency [MHz] 3500 4000 4500 5000 Figure 6.6.: Comparison of the simulation result including the parasitic effects and an initially processed ideal profile. ation of clock buffers that are possibly optimized for the rising edges. In this case, the absolute value of the wire load is the same for both types of driving transistors, but the relative effect is considerably less significant for the typically stronger p-type transistors. The delays of the current peaks introduced by the wire loads would therefore cause different effects for the rising and the falling edge, and consequently lead to such uncertainties in the frequency domain waveform. But since these current consumption models are typically used to identify the most dominant frequencies, the envelope of the higher values is important. And it can be observed in Figure 6.6 that these results are reasonably accurate for a frequency range up to several GHz. 6.3. Power Distribution Networks There are several effects on the system performance and therefore also the current consumption profiles, that are caused by the power distribution systems. As introduced in Chapter 2, there are at least resistances and capacitances that cause voltage drops, but there are also significant inductive effects, which become more and more important due to technology shrinking. Analysis of test-chips, such as the one introduced in [35], have shown that the introduced method to post-process ideal current profiles is possibly also applicable for an estimation of the parasitic power distribution network effects. The plots in Figure 6.7 show the simulation results of a test-chip module in an ideal environment, with the extracted interconnect wire parasitics, and the additionally modeled power 94 6. Parasitic Effected Current Modeling 0.8 ideal interconnects power supply 0.7 current [A] 0.6 0.5 0.4 0.3 0.2 0.1 0 0 200 400 600 800 1000 1200 time [ps] 1400 1600 1800 2000 Figure 6.7.: Current profiles for ideal conditions, including wire models for the cell interconnects, and an additionally modeled power supply system. lines. It is shown that the amplitudes of the parasitic effected current profiles are significantly reduced. Both voltage drops and decoupling capacitances play a significant role in this case. While voltage drops are responsible for a considerably reduced switching performance of the on-chip devices, the decoupling capacitances are providing a portion of its charge at the initial phase of the switching activities. This leads to the comparatively low amplitude, but there is also a significantly long phase of current flow after the maximum peak value. During this time, the decoupling capacitances are re-charged, but also all other nodes in a high state are pulled up to the nominal power supply voltage. Figure 6.8 shows the Fourier transforms of all the mentioned current profiles, including the result of the post-processed ideal profile. In this example, all the onchip effects in terms of interconnect wires and power distribution network parasitics are considered to approximate the respective simulation result. Given that appropriate parameters are available for the post-processing, it can be observed that the results are acceptable for the relatively regular structures on the analyzed test-chips. But for complex systems with several millions of transistors, wich possibly consist of numerous sub-systems and various function units, the power grid is typically much more irregular. Such systems probably introduce additional effects that cannot be reliably covered by this method. Tools, such as EXPO [11] are intentionally developed to model the effects of complex power distribution networks. Here, the power grid is segmented to consider the possibly different structures of the respective system partitions, and modeled by an appropriate network of resistances, capacitances, and even inductances. 6.3. Power Distribution Networks 95 5 ideal interconnects power supply processed current [mA] 4 3 2 1 0 0 2 4 6 8 10 frequency [GHz] 12 14 16 Figure 6.8.: Envelope of the Fourier coefficients of the simulation results for ideal conditions, including wire models for the cell interconnects, and an additionally modeled power supply system, compared to the processed ideal profile to approximate all parasitic effects. 7. Implementation and Verification The introduced methods to model the current consumption of digital integrated circuits based on gate-level netlists have been implemented in software and basically verified by SPICE based transistor-level simulations. The most important implementation considerations and the finally achieved current consumption profile quality are discussed for some particular examples in this chapter. Since gate-level circuit descriptions do not provide any information about the internal structure of the instantiated cells, a standard cell library has been characterized by applying the method introduced in Chapter 4. The considerations and implemented data structures of the generated library of pre-characterized cells, as well as the finally provided parameters, are introduced in Section 7.1. The capabilities of the discussed methods to model the current consumption profiles, discussed in Chapter 5, are verified by the evaluation of the results for various circuits with different characteristics. The pattern based simulation method has therefore been applied for relatively small circuits. And the feasibility of the modeling approach for complex designs is discussed for appropriate modules, such as the core unit of a high-end microcontroller. As chip-level current consumption models typically consists of current sources and several passive elements, the finally resulting profiles are converted to so-called equivalent current sources. These conclusive tasks are discussed in the last sections of this chapter. 7.1. Library of Characterized Cells A library of characterized cells actually consists of two parts: the static properties that are given by the internal cell structure, and the dynamic operating behavior in terms of timing characteristics and current consumption profiles. 7.1.1. Static Properties As discussed in the previous chapters, the dynamic behavior of a cell significantly depends on the characteristics of the connected cells in the circuit environment. Therefore, the following parameters, representing the most important characteristics of the internal cell structure, are extracted from the transistor-level cell descriptions and provided as the static cell properties part of the library: 98 7. Implementation and Verification cell name: Given that any cell in a library has a unique name, all cell instances in a gate-level netlist can be identified by its name. cell type: There are different types of cells, such as combinational gates, flip-flops, memories, and latches. The cell type is important for the circuit partitioning, as storage elements are for instance considered different to combinational gates. But also the data structure of the dynamic characteristics representation depends on the cell type. The internal activity of flip-flops is for instance effected by the output state, whereas the behavior of combinational gates only depends on the input signals. As mentioned before, the switching capabilities of all on-chip devices is primarily given by the respective driver strengths and output loads. But there are some additionally required characteristics to specify the properties of a cell. Following attributes are therefore provided for the particular cell ports: port name: Cell interconnections are specified in netlists by the association of the respective port names (see Section 7.2.1). direction: As gate-level netlists provide no information concerning the respective port directions (input, output, or bidirectional), this parameter is necessary to determine the actual circuit structure. port properties: One of the most important parameters for the dynamic behavior of a cell are the transistor sizes of the driving cell and the respective output loads (see Section 4.4). The port properties are therefore converted to appropriate equivalent inverter parameters, representing the load characteristics of an input, or in case of an output the properties of the driving transistors. function: As a cell may have more than one output, such as the sum and the carry of a full-adder, a function in terms of a truth table is associated to any cell output. For a full-adder, with the data inputs A and B, the carry-in Ci, and the outputs sum S and carry-out Co, the truth table is shown in the following table. By defining the structure of the table, the input states can be generated, and it is sufficient to store the respective output vectors. A 0 1 0 1 0 1 0 1 B 0 0 1 1 0 0 1 1 Ci 0 0 0 0 1 1 1 1 S 0 1 1 0 1 0 0 1 Co 0 0 0 1 0 1 1 1 The implemented data structure of the static cell properties is similar to an array of cell prototypes with the mentioned parameters. As any cell can be identified by its name, all the other properties are organized in an associated record that can be copied and instantiated in the simulation environment. 7.1. Library of Characterized Cells 99 7.1.2. Dynamic Characteristics This part of the library is intended to provide all required parameters to enable a determination of the timing characteristics and the current consumption profiles for a given event at any characterized cell. Therefore, all cells have been simulated with a reasonable set of transitions in different circuit environments. The parameters, describing the respective timing characteristics and current profiles, are subsequently extracted from these single cell simulation results, and stored in an appropriate data structure. Timing Characteristics As discussed in Chapter 4, the behavior of a cell significantly depends on the input signal characteristics. Since the signal rise and fall times are typically different, both parameters are extracted. And given that the output signals are synonymous to the input signals of the connected cells, the output slew rates are important as well. For the introduced method, to determine single event current profiles, which are subsequently superposed, the delay between the input event and the output response is also required. These signal propagation delays are therefore extracted for all outputs. As a result, the library provides the following timing characteristics: • input and output signal slew rates • signal propagation delays Current Profiles It has been shown in Section 5.1 that the current profile shapes are quite similar for a given event. An alignment of a reference profile to a set of characteristic points is consequently a feasible approach to determine the actual waveform for various environmental conditions. Hence, it is sufficient to provide the following data: • time instances, when the current profile amplitude is zero, and the time and amplitudes of the local profile minima and maxima • value of the mean current consumption • reference profiles Library Implementation All these previously mentioned parameters are provided for all relevant port events. As the characterization simulations can be done for some environmental conditions only, an efficient method to allocate and interpolate the respective data sets is required. The finally implemented data structure is illustrated in Figure 7.1. 100 7. Implementation and Verification list of characterized cells port transitions reference current profiles value value value value value value value value value value value parameters parameters parameters parameters parameters Figure 7.1.: Data structure of the dynamic characteristics and profiles in the library of characterized cells. Starting at the list of characterized cells, there is a reference to the available port transitions for the particular cells. The characteristic parameters and the reference profiles are organized in a tree structure and associated to the respective transitions. Figure 7.1 for instance shows these references for the third cell in the list and for the second transition. The node values at the different levels in the parameter tree represent one of the cell environment parameters, such as the input signal characteristics and the output load. The transition timing and current profile parameters for the respective conditions are finally located at the leaf nodes. As there are only a few reference profiles per transition, these values are stored in a separate branch of the tree. One reason for this library structure is the support of an efficient interpolation procedure for the particular parameters. Provided that the node parameters are sorted at any level in the tree, the requested values can always be determined by an interpolation of two neighbored values. The timing characteristics and profile parameters are therefore determined by a recursive interpolation of the data that are referenced by two neighbored nodes. 7.2. Circuit Description Interpretation 101 7.2. Circuit Description Interpretation As the introduced modeling method is intended to be applied as early as possible in the design flow, it is based on the circuit description in form of Verilog gate-level netlists. The structure and syntax of such a gate-level description is for instance specified in the respective standard [36]. After the netlist import and circuit interpretation, it is necessary that the processed data are represented in an appropriate structure, to enable an efficient performance of the applied simulation methods. 7.2.1. Verilog Gate-Level Netlist Import Listing 7.1 shows for instance the gate-level netlist of a basic 3-bit counter in Verilog syntax. In this hardware description language any circuit is called a module and identified by a module name and the list of ports. The particular port directions are specified by the following statements, declaring them as an input, output, or even as a bidirectional inout port. The internal circuit of a module is subsequently described by a consecutive list of submodule or cell instances. Each of these instances is specified by the submodule name, which is synonymous to the instantiated cell type. The following identifiers are the instance name and a list of the submodule or cell ports with the respective names of the connected nets. Listing 7.1: Verilog netlist of a 3-bit counter. module counter_3bit ( clk , en , out_a , out_b , out_c ); input clk , en ; output out_a , out_b , out_c ; R150_EXOR R150_EXOR R150_EXOR R150_AND R150_DFF R150_DFF R150_DFF XOR0 XOR1 XOR2 AND0 DFF0 DFF0 DFF0 (. A ( out_a ) , . B ( en ) , . Z ( in_a )); (. A ( out_b ) , . B ( out_a ) , . Z ( in_b )); (. A ( out_c ) , . B ( and_a ) , . Z ( in_c )); (. A ( out_a ) , . B ( out_b ) , . Z ( and_a )); (. CP ( clk ) , . D ( in_a ) , Q ( out_a )); (. CP ( clk ) , . D ( in_b ) , Q ( out_b )); (. CP ( clk ) , . D ( in_c ) , Q ( out_c )); endmodule Given that the order of the cell instances, but also the order of the names in the port lists are arbitrary, the cell interconnections are solely described by the associated port and net names. These names are unique and define that all ports that are associated to a given net are interconnected. Hence, a gate-level netlist is actually a textual description of the schematic view of a circuit, as it is shown in Figure 7.2 for the exemplified counter. 102 7. Implementation and Verification Figure 7.2.: Schematic view of a 3-bit counter. The netlist import is therefore done line by line and temporarily organized in a linked list of cell instances. The list nodes are therefore represented by records with the respective cell attributes that are provided by the circuit description. As the particular cells are instances of black boxes, no details on the port directions and cell characteristics are provided by the imported netlist. But because the cell types are unique, additional parameters can be imported from the static properties part of the pre-characterized cell library. With this merged information, the particular cell attributes and port interconnections are sufficiently specified. As a result, the imported data can be reorganized into a tree structure that is similar to the schematic view or the circuit. The interconnect lines are consequently implemented as references (pointer) to the respective cell data objects, as it is shown in Figure 7.3 for the discussed counter. It is additionally shown that any output of a cell instance consists of an array with the references to the connected cells. The particular ports of the connected cells are also known by the driving cell (e.g. p(D) is a pointer to the port D of the referenced cell). As the module inputs are considered similarly, it is finally possible to execute the paths of consecutively switching cells by following the respective references. 7.2.2. Path Categorization The modeling approach for complex modules has been discussed in Chapter 5. It has been shown that the introduced methods are most effective when a module is partitioned, and appropriate simulation methods are applied for the respective circuit parts. The paths of successively connected cells are therefore categorized. This is done by analyzing the paths in terms of evaluating the types and the particularly connected ports of all cells that are accessible by the respective path. 7.2. Circuit Description Interpretation inputs 103 XOR0:R150_XOR DFF0:R150_DFF XOR1:R150_XOR DFF1:R150_DFF AND0:R150_AND XOR2:R150_XOR DFF2:R150_DFF Figure 7.3.: Data structure representation of a 3-bit counter. Following path types are therefore determined and added to the properties of the respective root node (module input): set/reset: The asynchronous system set and reset signals are generally distributed to the respective storage element inputs in synchronous designs. There may be a certain tree of buffers to distribute the load of possibly numerous ports to a reasonable number of driving buffers. This kind of path can generally be considered as terminated at storage element inputs. scan test: A scan test is usually done by reconfiguring the paths of a circuit to form so-called scan chains. As only special flip-flops support this feature, the scan enable and test input signals are typically distributed from the respective module inputs to such scan flip-flops. clock: As mentioned in the previous chapters, the clock signal is typically distributed by a tree structure with possibly several gating cells to the clock inputs of all storage elements. combinational: All paths that cannot be definitely assigned to one of the above categories, are considered as general combinational logic path. 104 7. Implementation and Verification Determining these path categories enables the application of reasonable simulation methods for the respective path types. As the set/reset and scan test paths are considered as inactive during normal system operation, these module input ports are consequently set to the according constant state to disable these features. The configuration of the circuit is subsequently done by distributing these static values from the respective module inputs to the intentionally disabled cells by evaluating the functions of the cells that are associated to such a path. As a result, these circuit parts are considered as inactive, and the possibly involved gating cells, multiplexers, and flip-flops are configured to operate in a normal mode. In addition to the configuration, a path categorization is required for the partitioning of the circuit into the clock distribution network, the storage elements, and the combinational logic blocks (see Section 5.3.1). 7.3. Pattern Based Simulation In this most common simulation method stimuli vectors are applied at the module inputs and propagated along the paths of consecutively switching cells according to the internal circuit functions. The switching activities of the particular cells are consequently given by the input stimuli events and the particular cell functions. 7.3.1. Software Implementation As mentioned before, the implemented data structures are similar to the schematic view of the imported circuit (see Figures 7.2 and 7.3). Since this current profile calculation method is applicable for single events, the circuit analysis is sequentially done for one cell after the other. Starting at the module inputs, the triggered events are propagated along the paths of consecutively switching cells. The current profiles and timing characteristics are subsequently determined for the respective events. Given that the environmental circuit conditions are required to address the appropriate parameters in the library, the attributes of connected neighbor cells are analyzed, and the appropriate parameters are determined. The output load properties are therefore determined by an evaluation of the connected (referenced) cell inputs. The respective port properties are subsequently combined to a set of equivalent inverter parameters, as introduced in Section 4.3. These parameters, describing the total output load, are provided to all referenced cells. With the additionally distributed characteristics of the driving cell output, all relevant parameters describing the circuit environment are consequently determined. As a result, all required parameters for the addressing and interpolation of the respective data, provided by the library of pre-characterized cells, are available. The determination of the current profiles and the timing characteristics can consequently be done according to the introduced method. 7.3. Pattern Based Simulation 105 out13 BUF11 BUF12 BUF13 out43 BUF41 BUF42 BUF43 out90 in BUF00 BUF10 BUF20 BUF30 BUF40 BUF50 BUF60 BUF70 BUF80 BUF90 out63 out33 BUF31 BUF32 BUF61 BUF62 BUF63 BUF33 Figure 7.4.: Schematic view of a simple circuit consisting of several buffer cells. Since the switching behavior of a cell highly depends on the signal slew rates, but the actual transition timing can only be approximately estimated with the properties of the directly connected cells, the calculated output signal characteristics are additionally distributed to the triggered cells. This enables an adaption of the respective parameters to fit the particular input characteristics and consequently leads to more accurate results. The current profiles for the particular events are finally superposed. The appropriate time instance, when the particular profiles are superposed, are determined by the accumulation of the respective signal propagation delays. 7.3.2. Simulation Results Figure 7.4 shows the schematic view of a simple circuit consisting of several buffer cells. The profiles, determined by the application of the pattern-based method, are compared to the respective SPICE simulation results in Figure 7.5, where the results are plotted for both a rising and a falling input signal transition. It can be observed, that even there are some minor local differences in the profile waveform, the accuracy of the determined event timings and the current consumption per transition are almost congruent with the SPICE simulation results. A comparison of the profile lengths shows that the propagation delay uncertainties are in a negligible range of less than one picosecond. As the analyzed circuit consists of buffer cell instances only, all cells are triggered to perform the same operation. In case of a rising input event, all cells are consequently forced to drive a low-high transition, where the PMOS transistors connected to the output become conductive and pull the respective net towards VDD. On the other hand, the NMOS transistors become active in case of high-low transitions. Since both events are characterized with the same loads (see Chapter 4), but the dimensions of the complementary transistors in a cell are usually different (PMOS 106 7. Implementation and Verification 0.6 0.5 0.4 0.3 0.2 SPICE simulated gate−level modeled 0.1 0 −0.1 0 50 100 150 200 250 time [ps] 300 350 400 450 350 400 450 (a) rising edge 0.6 0.5 current [mA] 0.4 0.3 0.2 SPICE simulated gate−level modeled 0.1 0 −0.1 0 50 100 150 200 250 time [ps] 300 (b) falling edge Figure 7.5.: Current consumption profiles for a structure consisting of several buffer cells, that has been triggered by a rising and a falling input signal edge. The gate-level modeled profiles are compared to the respective SPICE based simulation results. 7.4. Complex Module Modeling 107 is typically larger than NMOS), the actual circuit properties differently match the characterized conditions. It can be observed in the plots, that the PMOS dimensions of the instantiated buffers are similar to the characterized conditions in the library, as the rising edge profiles show for this example less uncertainties than the falling edge profiles. The library parameters consequently are interpolated over a wider range in case of falling input signal transitions. But since these uncertainties are limited to very short peaks, and the total current consumption is almost exactly determined, the results can in both cases be considered as sufficiently accurate. 7.4. Complex Module Modeling The current profile determination for large and complex modules is most efficient when different methods are applied for the particular circuit partitions (see Section 5.3). Therefore, the imported circuit description and the already interpreted data need to be initialized accordingly. The profile determination is subsequently done by the application of an appropriate simulation method for the respective circuit parts. 7.4.1. Partitions Initialization As different methods are applied for the analysis of complex designs, an appropriate initialization of the respective partitions is necessary. While the clock distribution network is initialized by propagating the clock signal along the clock tree paths, the particular events at the inputs of the cells that are associated to a combinational logic block are randomly assigned. The storage elements are of special interest, as their outputs represent the boundary between a clocked partition and a combinational logic block. The switching activity and the time instances, when a particular cell becomes active, are therefore initialized according to the considerations introduced in Section 5.2.2. All gating cells are configured as enabled, and the clock signal is propagated from the respective module inputs to all storage elements, which are also forced to change their output state. Therefore, it is necessary that the module ports are categorized and the clock inputs are properly identified (see Section 7.2.2). Given that numerous combinational paths are typically also starting at the module inputs, these ports are initialized to become active at a similar time as the ones connected to flip-flop outputs. Since the actual event timings at these ports are unknown, the particular values are intentionally set within a certain range. Based on the determined time instances, when the root nodes of the combinational logic paths possibly become active, all other cells along these paths are initialized for the subsequently applied random activity based method. 108 7. Implementation and Verification 7.4.2. Current Profile Determination Provided that the clock signal pattern is known, the clocked partitions are modeled by applying the pattern based simulation method discussed in Section 7.3. The configuration of the supposed active gating cells is therefore randomly done, according to the given parameter for the designated clock activity level. For an efficient configuration of the clock tree, an array providing the references to the particular gates is generated and globally accessible. As mentioned before, the determination of the active clock paths is done prior to each simulation run, and the current profiles are subsequently determined for the events given by the clock pattern and the particular cell functions along the activated paths. On the other hand, the current profiles for the combinational logic partitions are modeled by applying the introduced random activity interpretation method. A two-dimensional array with references to the particular cells associated to the respective logic depths is therefore generated and also globally accessible. The current profiles for the intentionally active cells are subsequently determined for the randomly assigned switching events. The selection and initialization of the active cells and events is also done prior to each simulation run. Due to the evaluation of the particular cell delays in course of the partitions initialization, all partial profiles are associated to the respective event timings. As a result, the composition of the current profiles for an entire module is finally done by a superposition of all the partial results at the given time instances. 7.4.3. Simulation Results Figure 7.6 shows the current profiles for the clocked subsystem, representing the clock distribution paths with all the gating cells and the clock triggered storage elements. The analyzed circuit consists of approximately 2.500 clock buffers and 16.000 storage elements (14.500 flip-flops and 1.500 latches). To show the accuracy of the introduced method, the modeled profiles for the clock tree, the storage elements, as well as the sum of both, representing the entire clock subsystem of the analyzed model, are compared to the respective SPICE simulation results. The circuit has therefore been exported as a netlist in SPICE syntax including all required stimuli sources, such as the gating cell control signals, as well as the data inputs and enable signals of the relevant flip-flops and latches. It can be seen that the modeled profiles almost match the transistor-level SPICE simulation results. Particularly the clock tree waveforms are practically congruent. Given that the module consists of a multiple number of cells, compared to the circuit discussed in 7.3, the minor uncertainties of the single event profiles are almost compensated. The partial results for the storage elements show on the other hand some differences, which are primarily caused by the reason that the particular activities, stimulated at the SPICE simulation and the randomly determined events, 7.4. Complex Module Modeling 109 2 SPICE simulated gate−level modeled current [A] 1.5 1 0.5 0 0 100 200 300 time [ps] 400 500 600 (a) rising edge 2 SPICE simulated gate−level modeled current [A] 1.5 1 0.5 0 0 100 200 300 time [ps] 400 500 600 (b) falling edge Figure 7.6.: Comparison of the SPICE based transistor-level simulation results and the modeled profiles for a clock subsystem of a microcontroller core unit. The profiles are shown for the clock signal distribution paths (including numerous gating cells), the clocked storage elements, and the sum of both components. 110 7. Implementation and Verification are probably different. The input stimuli of all flip-flops and latches are constrained to avoid any output activity in both cases, but the cell behavior also depends on the actual output state (see Chapter 4). A comparison of the results in Figure 7.6(a) and (b) shows that the current flow for the storage elements is modeled with a lower peak amplitude, but with some delayed activity for the rising edge, while the falling edge timings are almost exactly determined. Provided that these differences in the profiles are caused by the random flip-flop states mentioned above, such uncertainties can be accepted. For an entire module also including the combinational logic parts, the time domain current profiles and the according Fourier transformed results are shown in Figure 7.7. As mentioned before, transistor-level simulations of large modules, consisting of several hundred thousand cells, demand excessively high computational effort and memory. The modeled profiles are therefore verified by SPICE simulation results for designs with a moderate complexity of several thousand cells. The cell counts of this module are: 237 clock buffers, 8 gating cells, 3167 flip-flops, 35 latches, and 13910 combinational cells. The simulated clock period is actually 10 ns (100 MHz) with equidistant rising an falling edges, but for a better view of the profile details, the respective partial results are consecutively plotted, and the time axis is truncated after the falling edge profiles. It can be seen that the current profile parts, which are caused by the clock subsystem, almost correlate with the SPICE simulation result. Since the activity of the combinational logic blocks are randomly determined, these parts of the waveform show some minor differences. Therefore, the results of several simulation runs are plotted, to show the uncertainties caused by the random selection of the triggered cells and events, which are particularly observable at such a relatively small design. The plotted Fourier coefficients in Figure 7.7(b) show, that there is also a good correlation of the modeled behavior and the SPICE simulation result in frequency domain. This example consequently shows that all important properties of the analyzed circuit are considered, and the introduced modeling approaches are feasible to approximate the current profiles with acceptable accuracies. 7.5. Profile Post-Processing For the composition of the partial results to the final profiles of entire modules, but primarily for the post-processing of the ideal current profiles, a tool with the required capabilities has been implemented to consider the parasitic effects of the interconnect wires. The graphical user interface of this tool, providing several options to test the effects of some important parameters, is shown in Figure 7.8. The list in the upper left region of the program window shows the available profiles found in the directory profile path. Pressing one of the buttons labeled width add starts the routines to import the profiles from the respective file and plots 7.5. Profile Post-Processing 111 0.6 SPICE simulated gate−level modeled 0.5 current [A] 0.4 0.3 0.2 0.1 0 −0.1 0 500 1000 1500 time [ps] 2000 2500 3000 (a) time domain profiles 14 SPICE simulated gate−level modeled 12 current [mA] 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 3 frequency [GHz] 3.5 4 4.5 5 (b) frequency spectrum Figure 7.7.: Time domain waveform and Fourier coefficients of the modeled current profiles, compared to the according SPICE simulation result. 112 7. Implementation and Verification Figure 7.8.: Graphical user interface of the implemented tool that features the superposing of the current profiles for different circuit modules and the visualization of different post processing parameter effects. the time domain waveform as well as the Fourier transform results at the embedded figures. In addition to the control elements to scale the profile amplitude and to stretch it in time, there are some options to test different clock signal characteristics. The base frequency of the clock (f ), as well as an optional divider factor (clkdiv ), and a possible duty cycle ratio (duty) can be individually set for each profile via the respective elements. As the profiles are internally handled as split into the parts that are related to the rising and the falling clock edges, the reference system clock frequency can be modified at any time. The checkbox inv furthermore provides the opportunity to show the effects of an inverted clock (swapped rising and falling edge), which can be individually set for any profile. The resulting summary profile of all components that are marked as active can be finally exported as an Equivalent Current Source (ECS). Depending on the target application, the format of the exported data is selectable, and the interval between the values (sample rate) is adjustable as well. 7.6. Chip-Level Current Consumption Models 113 7.6. Chip-Level Current Consumption Models Complex integrated circuits typically consist of several subsystems, such as the processor core and several peripheral controller units of a microcontroller. A method to consider the cell interconnect wires by post-processing the current profiles for ideal conditions has been introduced in Chapter 6. It has also been discussed that the effects of power distribution networks cause similar effects, which can be basically modeled by the application of this method as well. The current profiles have therefore been determined for a module of a testchip with an integrated on-chip current and voltage sensor. The results considering both, the interconnect wires and the power distribution network, are verified by a comparison to the simulation results of a model that has been approved in [35]. Figure 7.9 shows the post-processed profile and the simulation result of the testchip model in frequency domain. It can be seen that the characteristics of the spectrum can be well approximated by the introduced method up to around 700 MHz for this example. At higher frequencies the simulation results show at least a similar trend, but the post-processed waveform spectrum is more pessimistic. 100 SPICE simulated post−processed current [dBµA] 90 80 70 60 50 40 30 0 1000 2000 3000 4000 frequency [MHz] 5000 6000 7000 Figure 7.9.: Comparison of Fourier coefficient envelopes of the modeled current profile for a testchip and the simulation result of an on-chip sensor model considering interconnect wire and power distribution network effects. As the power supply networks are typically optimized for the respectively supplied subsystems, the actual chip partitioning becomes significantly important when the complexity of the power grid increases. Given that these effects are in addition probably irregular, thereon specialized tools, introducing sophisticated power network models, promise results with improved accuracies. Such an application is for 114 7. Implementation and Verification instance the already mentioned EXPO [11], which is able to generate models that are based on a pre-layout estimation of the power grid characteristics. It divides the chip area into a scalable number of tiles and applies appropriate models for the particular segments, depending on the respective circuit type. A circuit analysis, based on extracted layout parameters can be for instance done using XcitePI [8]. All the mentioned tools which are applicable to model the behavior of complex chips basically provide a network of passive elements, introducing the parasitic effects of the on-chip wire structures. But they are typically not capable to generate the active chip model components in terms of the equivalent current sources representing the current consumption of the on-chip devices. The methods introduced in this thesis to model the current profiles for the internal activity of the particular circuit modules are consequently an essential component to generate comprehensive chip behavior models. The current profiles are modeled and exported as the required equivalent current sources (see Section 7.5), and can subsequently be included into chip-level emission models. 8. Conclusion and Outlook 8.1. Conclusion This thesis presents methods to efficiently model the dynamic behavior of digital modules in terms of the transient current consumption waveforms. As the introduced methods are based on circuit descriptions that are usually available in early design phases, the generated models are feasible for design studies to predict the effects of design variations in early phases of the circuit design process. As the model generation is based on gate-level netlists, a method to characterize a standard cell library has been introduced. The characteristics of the different cell types are presented, and approaches to minimize the required effort for the characterization process, as well as the amount of necessarily stored data in the library, are discussed. Since this procedure needs to be applied only once per technology and library, but for all particular cells, it has been almost automated by the implementation of scripts for the single cell simulations and the subsequent parameter extraction. This procedure has been applied for a 130 nm technology, but has also been approved for a 90 nm library. Methods used to generate the current profiles for modules with different sizes and complexities have been introduced. The profile determination is based on gate-level netlists and the library of pre-characterized cells mentioned before. A partitioning of the analyzed modules into the clocked circuit parts and the combinational logic blocks also enables the application of properly matched methods to characterize the dynamic circuit behavior in terms of the timing characteristics and the transient current consumption waveforms. It has been shown that a pattern-based simulation method is typically the best choice for the circuit partitions that are triggered by a specified system clock, and a random activity interpretation is most efficient for large combinational logic blocks. Given that at least the effects of cell interconnect wires need to be considered at the transient current models, approaches for a post-processing of the waveforms are introduced. The determined profiles for ideal conditions (i.e. a constant power supply voltage and lossless wires) are therefore manipulated to approximate these effects. Considerations for the software implementation of a tool that is capable to determine the current profiles based on the introduced methods, as well as a verification of the generated models for circuits with different complexities and characteristics, are also presented. An additionally implemented tool furthermore provides several 116 8. Conclusion and Outlook control elements that allow an investigation of alternative design parameters. Here, different clocking schemes can be tested, but also a supplementary manipulation of the profiles is possible to show the effects of modified parameters, as for instance the activity levels of the respective design partitions or submodules. Such a parameter variation is similar to traditional approaches based on an intuitive manipulation of a given profile, and therefore comes with several uncertainties, but it allows an almost instant and as a first approximation reasonable prediction of the effects on the system behavior. The export of the modeled profiles as equivalent current sources finally enables a subsequent generation of the mentioned noise models for a given circuit [37]. As promised in the introductory section as the goals of this work, the models can be generated before a layout is available, and they are also based on the actual circuit for an appropriate consideration of the specific design characteristics. For an efficient model generation in terms of good accuracies, but nevertheless demanding also a low computational effort, the models are determined at a reasonably high level. The introduced methods consequently allow a circuit characterization in less than one hour, even for complex modules consisting of several millions of transistors, on a standard Desktop PC (e.g. 2 GHz Pentium IV, 1 GB RAM). It has also been discussed that the parasitic effects of on-chip wires lead amongst others to a shift of the current profile frequency spectrum characteristics towards lower frequencies. The introduced approaches using basic signal processing methods are practical to generate models that are valid up to a few GHz for currently applied technologies. The benefits of accurate emission models for component- or even system-level simulations are shown in course of the MISEA project [38], and the introduced methods have been optimized and approved for state of the art technologies and circuits for automotive applications. 8.2. Outlook Future technologies will supposedly introduce additional effects that probably have to be considered as well. With a technology scaling to considerably smaller onchip structures, the leakage current is for instance expected to become significantly higher compared to the dynamic current consumption caused by switching activities. On the other hand, faster switching transistors lead to higher signal slew rates and shorter propagation delays. In this case, the inductance of the interconnect wires probably will become significantly important and cannot be neglected anymore [39]. Given that the impact of on-chip wiring is expected to become more significant due to reduced structure sizes and increased switching performance, the model generation algorithms should be improved and adapted to upcoming requirements. A probably also important topic is to look for approaches for an almost automatic determination of appropriate profile post-processing parameters. A. Fourier Transform Characteristics Frequently, the frequency spectrum of a time domain signal or profile needs to be analyzed. As it is the most commonly used method for a spectral analysis, this chapter gives a basic overview of the Fourier transform. The definitions and for this work most important properties and transform pairs are therefore summarized. A derivation of the shown formulas, as well as a more comprehensive discussion of the characteristics, can be found in the respective literature, as for instance [40, 41]. A.1. Definitions Fourier has shown that any continuous periodic signal xp (t) can be represented as a linear combination of properly chosen sine and cosine functions: ∞ xp (t) = a0 X + [ak cos(2πkf0 t) + bk sin(2πkf0 t)] 2 k=1 (A.1) where the coefficients ak and bk are the Fourier coefficients and the parameter f0 is the fundamental frequency that is equal to the inverse period time T0 : 1 f0 = (A.2) T0 By a combination of the sine and cosine waves with the same frequency, the Fourier series can be also written as: ∞ X xp (t) = A0 + Ak cos(2πkf0 t + αk ) (A.3) k=1 A substitution of the sine and cosine functions using Euler’s formula leads to the complex notation of the Fourier series: xp (t) = ∞ X ck ej2πkf0 t , (A.4) k=−∞ where the coefficients ck are the complex Fourier coefficients. This equation is also called the synthesis equation, whereas the analysis equation to determine the coefficients ck is given by Z T0 /2 1 xp (t)e−j2πkf0 t dt. (A.5) ck = T0 −T0 /2 118 A. Fourier Transform Characteristics The Fourier series is limited to periodic signals. For aperiodic signals the Fourier transform is typically used: Z ∞ x(t)e−j2πf t dt (A.6a) X(f ) = −∞ The inverse transform is given by Z ∞ x(t) = X(f )ej2πf t df, (A.6b) −∞ where X(f ) is the Fourier transform of x(t), and x(t) is the inverse Fourier transform of X(f ). This relation is often shown as: x(t) X(f ) (A.6c) Given that signal processing is often done for sampled values, the discrete Fourier transform is important. It is derived from the Fourier transform for time-continuous signals, allows the spectral analysis of time-discrete waveforms, and is given by: X[k] = N −1 X 2π x[n]e−jkn N , k = 0, 1, 2, 3..., N − 1 (A.7a) n=0 N −1 2π 1 X x[n] = X[k]ejkn N , N k=0 n = 0, 1, 2, 3..., N − 1 (A.7b) where X[k] is the discrete transform of x[n], and x[n] the inverse discrete transform of X[k], which can again be written as the relation: x[n] X[k] (A.7c) The variables n and k are the indices of the time-domain and the frequency-domain vectors, and N is the number of sampled values, which is equal in both domains. A.2. Fourier Transform Properties There are several properties of the Fourier transform, which are also important for some approaches in this work. A multiplication of the time-domain amplitudes with a given factor is for instance equivalent to a multiplication in the frequency domain with the same factor. Scaling a signal in time leads on the other hand to a compression of the spectrum on the frequency axis. The most important properties are therefore shown as follows: Linearity k1 x1 (t) + k2 x2 (t) k1 X1 (f ) + k2 X2 (f ) (A.8) A.3. Transform Pairs 119 Time- and Frequency Scaling x(kt) f 1 X( ) |k| k (A.9) Shifting in time domain X(f )e−j2πf t0 (A.10) X(f − f0 ) (A.11) h(t) ∗ x(t) H(f ) · X(f ) (A.12a) h(t) · x(t) H(f ) ∗ X(f ) (A.12b) x(t − t0 ) Shifting in frequency domain x(t)ej2πf t0 Convolution A.3. Transform Pairs While transformed sinusoidal waves show exactly one value at the given frequency in the spectrum1 , other waveforms are decomposed to the appropriate frequencies. Some of these waveforms show specific functions in the frequency domain. The Delta Function shows for instance one single impulse in the time-domain signal, and results in a spectrum where all frequencies are present with a constant magnitude: x(t) = δ(f ) X(f ) = 1(f ) (A.13) On the other hand, a constant time-domain signal results in a single value at the frequency zero. x(t) = 1(f ) X(f ) = δ(f ) (A.14) For this work primarily important are the characteristics of the rectangle and triangle waveforms (see Figure A.1). A rectangle signal is transformed to a Sinc (also referred to as sin(x)/x) function. Since a triangle waveform is basically the result of a convolution of two rectangle signals, and given that a convolution in time-domain is equivalent to a multiplication in frequency domain, the respective spectrum shows a squared Sinc function. 1 There is actually a second value with the same magnitude at the appropriate negative frequency, but here are only the positive frequencies considered. 120 A. Fourier Transform Characteristics 1 amplitude 0.8 0.6 0.4 rectangle triangle 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 time [s] 0.2 0.4 0.6 0.8 1 (a) time domain waveform 0.1 rectangle triangle magnitude 0.08 0.06 0.04 0.02 0 0 1 2 3 frequency [Hz] 4 5 6 (b) frequency spectrum Figure A.1.: Time domain waveform (a) and frequency spectrum (b) of a rectangle and a triangle signal. Bibliography [1] D.E.C. Moehr. The legal situation on EMC in the European Union. In Proc. International Symposium on Electromagnetic Compatibility, pages 702–705, Tokyo, Japan, May 1999. [2] T. Steinecke. Design-In for EMC on CMOS Large-Scale Integrated Circuits. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 910–915, Montreal, Canada, August 2001. [3] C. Lochot and J.-L. Levant. ICEM: A new standard for EMC of IC, Definition and Examples. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, volume 2, pages 892–897, August 2003. [4] J.-L. Levant, M. Ramdani, and R. Perdriau. Solving Board-Level EMC Issues with the ICEM Model. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, Munich, Germany, November 2005. [5] M. Coenen and R. de Jager. Standardization for EMC IC Modeling. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 892–897, Istanbul, Turkey, August 2003. [6] ANSI/EIA 656-B. I/O Buffer Information Specification (IBIS), Version 4.1. GEIA, 2004. [7] IEC 62014-3. Integrated Circuits Electrical Model (ICEM). IEC, 2002. [8] E. Miersch, T. Steinecke, and M. Goekcen. Power Integrity Analysis of a Microcontroller (µC) plus its Chip Package (BGA). In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 211–214, Munich, Germany, November 2005. [9] E. Sicard and G. Peres. A Novel Software Environment for Predicting the Parasitic Emission of Integrated Circuits. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, Munich, Germany, November 2005. [10] M. Badaroglu, G. Van der Plas, P. Wambacq, S. Donnay, G.G.E. Gielen, and H.J. De Man. SWAN: High-Level Simulation Methodology for Digital Substrate Noise Generation. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, volume 14, pages 23–33, January 2006. 122 Bibliography [11] D. Hesidenz and T. Steinecke. Chip-Package EMI Modeling and Simulation Tool ”EXPO”. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 231–234, Munich, Germany, November 2005. [12] T. Steinecke, H. Koehne, and M. Schmidt. Behavioral EMI Models of Complex Digital VLSI Circuits. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 848–851, May 2003. [13] Y. Tsividis. Operation and Modeling of The MOS Transistor. Oxford University Press, second edition, 1999. [14] J. M. Raraey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits: A Design Perspective. Prentice Hall, second edition, 2003. [15] D. A. Hodges, H. G. Jackson, and R. A. Saleh. Analysis and Design of Digital Integrated Circuits. McGraw-Hill, third edition, 2003. [16] H. J. M. Veendrick. MOS ICs: From Basics to ASICs. VCH, 1992. [17] N. S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan. Leakage Current: Moore’s Law Meets Static Power. IEEE Computer, pages 68–75, 2003. [18] R. J. Baker, H. W. Li, and D. E. Boyce. CMOS: Circuit Design, Layout, and Simulation. Wiley-IEEE Press, 1998. [19] N. H. E. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. Weste, second edition, 1993. [20] P. J. Ashenden. The Designer’s Guide to VHDL. Morgan Kaufmann, second edition, 2002. [21] D. J. Smith. VHDL & Verilog Compared & Contrasted. In Proc. 33rd Design Automation Conference, pages 771–776, Las Vegas, USA, June 1996. [22] C. Mead and L. Conway. Introduction to VLSI Systems. Addison-Wesley, 1980. [23] D. G. Messerschmitt. Synchronization in Digital System Design. IEEE Journal on Selected Areas in Communications, 8(8):1404–1419, October 1990. [24] P. E. Gronowski, R. P. Preston W. J. Bowhill, M. K. Gowan, and R. L. Allmon. High-Performance Microprocessor Design. IEEE Journal of Solid-State Circuits, 33(5):676–686, May 1998. Bibliography 123 [25] V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic, and N. M. Nedovic. Digital System Clocking: High-Performance and Low-Power Aspects. WileyIEEE Press, 2003. [26] K. Yip. Clock tree distribution. IEEE Potentials, 16(2):11–14, April/May 1997. [27] K. B. Hardin, J. T. Fessler, and D. R. Bush. Spread Spectrum Clock Generation for the Reduction of Radiated Emissions. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 227–231, Chicago, August 1994. [28] S. Damphousse, K. Ouici, A. Rizki, and M. Mallison. All Digital Spread Spectrum Clock Generator for EMI Reduction. IEEE Journal of Solid-State Circuits, 42(1):145–150, January 2007. [29] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh. Activity-Driven Clock Design for Low Power Circuits. In Proc. IEEE/ACM Int. Conference on Computer-Aided Design, pages 62–65, June 1995. [30] A. H. Farrahi, C. Chen, A. Srivastava, G. Tellez, and M. Sarrafzadeh. ActivityDriven Clock Design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(6):705–714, June 2001. [31] A. Chatzigeorgiou, S. Nikolaidis, and I. Tsoukalas. A Modeling Technique for CMOS Gates. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18:557–575, May 1999. [32] J.M. Daga and Daniel Auvergne. A Comprehensive Delay Macro Modeling for Submicrometer CMOS Logics. IEEE Journal of Solid-State Circuits, 34:42–55, January 1999. [33] P. Maurine, M. Rezzoug, and D. Auvergne. Output Transition Time Modeling of CMOS Structures. In Proc. IEEE Int. Symposium on Circuits and Systems, pages 363–366, Sydney, Australia, May 2001. [34] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, second edition, 1992. [35] Jacek Kruppa and Dirk Hesidenz. High Speed, High Bandwidth On-Chip Current and Voltage Sensor. In Proc. IEEE Sensors, pages 1337–1340, 2006. [36] IEEE. IEEE Std 1364-2005 Standard for Verilog Hardware Description Language. IEEE, 2006. 124 Bibliography [37] T. Steinecke, M. Goekcen, D. Hesidenz, and A. Gstöttner. High-Accuracy Emission Simulation Models for VLSI Chips including Package and Printed Circuit Board. In Proc. 8th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 41–46, Torino, Italy, November 2007. [38] G. Steinmair, T. Steinecke, and R. Weigel. MISEA – Modelling of Integrated Circuit Devices for Automotive EMC Simulation. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 202–205, Munich, Germany, November 2005. [39] Yehia Massoud and Yehea Ismail. Grasping the Impact of On-Chip Inductance. In Proc. IEEE Circuits & Devices, pages 14–21, 2001. [40] D. Ch. von Grünigen. Digitale Signalverarbeitung. Fachbuchverlag Leipzig, second edition, 2002. [41] S. W. Smith. Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes, second edition, 2002. Publications [42] A. Gstöttner, T. Steinecke, and M. Huemer. High Level Modeling of Dynamic Switching Currents in VLSI IC Modules. In Proc. 5th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 207–210, Munich, Germany, November 2005. [43] A. Gstöttner, T. Steinecke, and M. Huemer. Activity Based High Level Modeling of Dynamic Switching Currents in Digital IC Modules. In Proc. 17th Int. Zurich Symposium on Electromagnetic Compatibility, pages 598–601, Singapore, February/March 2006. [44] A. Gstöttner, T. Steinecke, and M. Huemer. Fast High Level Modeling Methods for Dynamic Switching Currents of Digital IC Modules. In Proc. Austrian Conference on the Design of Integrated Circuits and Systems, pages 97–101, Vienna, Austria, October 2006. [45] A. Gstöttner and J. Kruppa. Modeling of Dynamic Switching Currents of Digital VLSI IC Modules and Verification by On-Chip Measurement. In Proc. 18th Int. Zurich Symposium on Electromagnetic Compatibility, pages 1 – 4, Munich, Germany, September 2007. [46] A. Gstöttner and M. Huemer. Estimation of Current Profiles for Large Digital VLSI Modules in Early Design Phases. In Proc. 8th Int. Workshop on Electromagnetic Compatibility of Integrated Circuits, pages 87–90, Torino, Italy, November 2007. [47] T. Steinecke, M. Goekcen, D. Hesidenz, and A. Gstöttner. High-Accuracy Emission Simulation Models for VLSI Chips including Package and Printed Circuit Board. In Proc. IEEE Int. Symposium on Electromagnetic Compatibility, pages 1 – 6, Honolulu, USA, July 2007.

Gate-Level Current Modeling of Digital Integrated Circuits for

Related documents

Products

Support

Gate-Level Current Modeling of Digital Integrated Circuits for

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib