TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Entwurfsautomatisierung Aging Analysis of Digital Integrated Circuits Dominik Lorenz Vollständiger Abdruck der von der Fakultät für Elektrotechnik und Informationstechnik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktor-Ingenieurs genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. sc.techn. Andreas Herkersdorf Prüfer der Dissertation: 1. Univ.-Prof. Dr.-Ing. Ulf Schlichtmann 2. Prof. Diana Marculescu, Ph.D., Carnegie Mellon University, PA, USA Die Dissertation wurde am 31.01.2012 bei der Technischen Universität München eingereicht und durch die Fakultät für Elektrotechnik und Informationstechnik am 24.04.2012 angenommen. Acknowledgments This thesis results from my work as an research assistant at the Institute for Electronic Design Automation at the Technische Universität München. First of all, I would like to thank Professor Ulf Schlichtmann for giving me the opportunity to do research at his institute and for encouraging me to work on this novel topic. His guidance and continued support, as well as the open and creative atmosphere at the institute have been essential for the successful completion of this research project. I also would like to thank the second examiner Professor Diana Marculescu for her interest in my research. Most of the work would not have been possible without the valuable cooperation of the Infineon Technologies employees working together with me on the HONEY research project. A special thanks goes to Georg Georgakos for his guidance and the fruitful discussions with him. It is a pleasure for me to thank my colleagues at the EDA institute for their collaboration and their friendship. It was a great time at the institute, which I will never forget. Finally, I want to express my heartfelt gratitude towards my wife, Nicole, and my little sunshine, Annika, for their continuous support or just for smiling when I come home. 3 Contents 1. Introduction 9 1.1. Objective of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2. Semi-custom design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3. Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. Fundamentals 2.1. (Static) timing analysis . . . . . . 2.1.1. Gate models . . . . . . . . 2.1.2. Timing graph . . . . . . . . 2.1.3. Incremental timing analysis 2.1.4. Sequential circuits . . . . . 2.1.5. Path enumeration . . . . . 2.2. State of the art of aging analysis . 2.2.1. Circuit level . . . . . . . . . 2.2.2. Gate level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 15 17 18 22 23 27 27 28 3. Aging effects and their impact on standard cells 3.1. Aging effects . . . . . . . . . . . . . . . . . . 3.1.1. Negative Bias Temperature Instability 3.1.2. Hot Carrier Injection . . . . . . . . . . 3.1.3. Stress conditions in CMOS logic gates 3.2. Impact on gate performance . . . . . . . . . . 3.2.1. Impact on combinational gates . . . . 3.2.2. Impact on flip-flops . . . . . . . . . . . 3.2.3. Impact on power dissipation . . . . . . 3.3. Technology trend . . . . . . . . . . . . . . . . 3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 37 44 46 49 49 53 56 57 60 4. Aging-aware static timing analysis 4.1. Aging-aware STA flow . . . . . . . . . . . 4.2. Workload determination . . . . . . . . . . 4.3. AgeGate: Aging-aware gate model . . . . 4.3.1. Canonical gate model . . . . . . . 4.3.2. Degradation equations . . . . . . . 4.3.3. Calculation of Stress Probabilities 4.4. Characterizing the standard cells . . . . . 4.4.1. Obtaining the sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 64 66 69 69 70 71 77 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Contents 4.4.2. Obtaining the internal gate structure . . . . . . . . . . . . . . . . . 4.4.3. Simplification of the gate model . . . . . . . . . . . . . . . . . . . 4.5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1. Waveform dependence of parameter drift . . . . . . . . . . . . . . 4.5.2. Comparison of AgeGate, circuit-level simulation and measurements 4.5.3. Aging analysis results . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Identifying possible critical paths in aged circuits 5.1. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Identification of PCPs . . . . . . . . . . . . . . . . . . . . . 5.2.1. Slack reduction step . . . . . . . . . . . . . . . . . . 5.2.2. Path delay reduction step . . . . . . . . . . . . . . . 5.2.3. Arrival time reduction step . . . . . . . . . . . . . . 5.2.4. Delay to sink reduction step . . . . . . . . . . . . . . 5.2.5. Common edge reduction step . . . . . . . . . . . . . 5.2.6. Removing edges and nodes . . . . . . . . . . . . . . 5.3. Realistic aged path delays . . . . . . . . . . . . . . . . . . . 5.3.1. Gate delay interval . . . . . . . . . . . . . . . . . . . 5.3.2. Realistic aged path delays for an inverter chain . . . 5.3.3. Maximal aged path delay of a general path . . . . . 5.3.4. Minimal aged path delay for a general path . . . . . 5.3.5. Minimal aged circuit delay . . . . . . . . . . . . . . 5.3.6. Use of minimal aged circuit delay in reduction steps 5.3.7. Wrap-up . . . . . . . . . . . . . . . . . . . . . . . . . 5.4. Considering process variations . . . . . . . . . . . . . . . . . 5.4.1. Block-based statistical static timing analysis . . . . 5.4.2. Representation of timing quantities . . . . . . . . . . 5.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1. Aging-aware timing model for modules . . . . . . . . 5.5.2. Monitoring of aging circuits . . . . . . . . . . . . . . 5.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1. Minimal aged delay . . . . . . . . . . . . . . . . . . 5.6.2. Node and edge reduction . . . . . . . . . . . . . . . 5.6.3. Possible critical paths . . . . . . . . . . . . . . . . . 5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 79 80 80 80 81 84 85 86 86 87 88 88 90 91 94 94 96 96 97 98 101 102 102 102 103 105 106 106 108 113 113 114 115 116 6. Conclusion 119 A. Constraints for NAND and NOR gates 121 B. More detailed results for PCP identification 123 Bibliography 125 6 Contents Acronyms 143 List of Symbols 145 7 1. Introduction In biology, aging of an organisms is defined as a progressive, irreversible process that inevitably ends with death. The maximal lifetime of an individual is significantly affected by aging [Wikipedia, 2011]. The same is true for integrated circuits (ICs). Aging effects cause the circuit performance to degrade and they have a significant impact on the specified lifetime of a circuit. Circuit aging can be regarded as a time-dependent variation. Aging is not the only variability the IC industry must cope with. In fact, variability has always been a fact of life in the IC industry. The reasons for variability can be classified into these three categories: Variations of the operating conditions: Primarily changes in supply voltage and operating temperature. Process variations: These denote deviations in process parameters from their nominal values that are present in an IC after it has been manufactured. Examples are variations in the concentration of dopants or the oxide thickness. In contrast to aging, manufacturing variations do not change over time once the IC has been manufactured. Time-dependent variations: These denote changes in the physical (and consequently, in the electrical) properties of an IC over time caused by aging effects. Variations of the operating conditions are handled during the design process by specifying a range (e.g. VDD,min and VDD,max ) within which the IC has to meet the specified properties (e.g. frequency or power consumption). Process variations have traditionally been considered by specifying so-called process corners which describe e.g. for delay the best or worst realistic combinations of process parameters, thus establishing generous guardbands against parameter variations. This modeling is increasingly considered to be problematic and statistical design methodologies have therefore been proposed as a remedy for dealing with manufacturing variations. A detailed overview of this field is given in Blaauw et al. [2008]. Time-dependent variation caused by aging effects, on the other hand has by far not received a similar amount of attention. Aging effects lead to a change of device parameters over time dependent on the operating conditions over lifetime and the workload. The workload defines the portion of the lifetime a device spends in a particular operating point. Negative bias temperature instability (NBTI), for instance, is regarded as the most severe aging effect nowadays. NBTI results in an increased threshold voltage (Vth ) 9 1. Introduction of PMOS transistors whenever the transistor is in inversion. The threshold voltage drift (∆Vth ) is accelerated by elevated temperature or supply voltage. The impact of variations on the circuit performance increases due to the continued technology scaling [Nassif, 2000]. The same absolute variation of the gate length, for instance, increases the relative variation since the nominal gate length is scaled by a factor of 0.7 × every two years according to Moore’s law [Moore, 1965]. The supply voltage is scaled as well. Therefore, a supply voltage variation or a threshold voltage variation have a larger impact on circuit performance. This is the case if a constant absolute variation for the different variability mechanisms is assumed. However, the variation caused by aging effects is going to increase, since these effects strongly depend on the strength of the electrical fields. The electrical fields continue to increase with scaling, because the transistor sizes are scaled more aggressively than the supply voltage since several technology generations1 . Variability is the reason why performance and power consumption vary from chip to chip and over time. To be able to still manufacture working and reliable products despite increasing variability, the performance guardbands must be increased or other techniques must be applied to make a product robust against variations. Examples of such techniques are dynamic voltage frequency scaling (DVFS) [Semeraro et al., 2002; Talpes and Marculescu, 2005; Herbert and Marculescu, 2009] or the use of redundant circuitry [Lyons and Vanderkulk, 1962]. Therefore, the operating frequency is not as high as it may be, chip area is wasted and the power consumption is higher than necessary. Hence, conservative safety margins and variation-aware design techniques make the design of competitive products more difficult and lead to a minimization or even elimination of the advantages of moving to the next technology node. One way out of this dilemma according to Austin et al. [2008] are innovative design techniques to reduce the reliability costs again. 1.1. Objective of this thesis The contribution of this thesis to reduce the reliability costs are methods to accurately analyze the timing degradation of a circuit caused by drift-related aging effects. This allows to tighten the safety margins again. Within this thesis the following objectives have been set and achieved: • Investigate the impact of aging effects on transistors, how can they be modeled and on which parameters do they depend. Furthermore, quantify the degradation of the properties of standard cells caused by aging effects. • Develop and implement an aging analysis to determine the timing degradation of ICs on gate-level. The developed aging-aware gate model should consider the dominant aging effects. • Develop an aging analysis on higher abstraction levels. This enables considering aging in earlier design stages and for complex systems. 1 Under the assumption that no breakthroughs are achieved to mitigate aging on technology level. 10 1.2. Semi-custom design flow • Furthermore, another approach is developed to reduce the safety margins even further by enabling a better-than-worst-case design style. To assure that such an aggressively designed circuit still works correctly during the specified lifetime, the degradation of the circuit caused by aging is periodically monitored and countermeasures are taken if the circuit ages too much. In the course of this thesis seven pre-publications [Lorenz et al., 2009a,b, 2010a,b,c,d, 2012] have been contributed to the scientific community. Furthermore, a patent for a time margin monitor for the assessment of aging and process variation was filed and granted [Henzler et al., 2009]. 1.2. Semi-custom design flow In the beginning of IC design in the early 1970s2 , circuit design was entirely manual work, even the layout was drawn by hand. However, without the development of sophisticated electronic design automation (EDA) tools, the design of state-of-the-art ICs would not be possible. Figure 1.1 depicts a simplified design flow from a hardware description language (HDL) to a layout also referred to as register transfer level (RTL) to GDSII flow. The purpose of this figure is to illustrate where timing analysis (TA) is required and an aging-aware TA would reduce the uncertainty of the delay prediction. Design flows are getting more and more complex and according to Scheffer et al. [2006, Chapter 1] this trend continues, amongst other things, due to variability and reliability challenges: “The RTL to GDSII flow has undergone significant changes in the last 25 years. The continued scaling of CMOS technologies significantly changed the objectives of the various design steps. The lack of good predictors for delay has led to significant changes in recent design flows. Challenges like leakage power, variability, and reliability will continue to require significant changes to the design-closure process in the future”. Everything starts with a product specification which includes constraints for performance, area, and power. Further constraints, especially in advanced technologies, are reliability and yield. The next step is to write a synthesizable description in a HDL (VHDL or Verilog). This representation at RTL is then transferred into a logic representation by logic synthesis [Sentovich et al., 1992]. A netlist of generic cells (e.g., NAND and NOT cells), which represent the logic function, is obtained and mapped to cells from a standard cell library. Next, the cells of the netlist are placed and the nets are routed. Before the chip can be processed, tested and packaged, the sign-off is performed by thoroughly verifying that the timing and other electrical performances meet the specification. 2 The first microprocessor, Intel’s 4004, was fabricated 1971. 11 1. Introduction It is very expensive and time consumspecification ing to process a chip. Therefore, it is not feasible to iteratively design a chip by processing it, testing it and making design changes. In fact, the IC industry is HDL quite unique by heavily relying on abstract log. models for designing a product. There are, synthesis for instance, transistor models to simulate the voltage and current waveforms on cirlog. functions cuit level; or gate models, which provide, tech. amongst other things, the delay of the mapping standard cells. The goal of models is to provide all the information that is necesnetlist sary on a particular abstraction level and omit unimportant information. Only by place & route abstraction it is possible to design stateof-the-art circuits with up to billions of layout transistors3 . The models must be as accurate as possible to provide a good predicsign-off tion for the performance, power and area of a design. Otherwise the final product tape-out might not meet the specification. TA is a crucial step during the design Figure 1.1.: IC design flow. of a digital circuit. Due to complexity reasons TA is done on gate level or even higher abstraction levels. Basically, the gate and wire delays along the longest, the so called critical path, are added up and it is verified whether the resulting circuit delay fulfills the timing specification, or not. When a circuit ages, the gate delays increase and the circuit may violate the timing specification although the specifications were met right after manufacturing (see Figure 1.2). In Chapter 3 it is shown that aging significantly degrades the gate delay. To consider this, a TA with an aging-aware gate model is required. Such an aging-aware TA is developed in this thesis. TA is required in many design flow steps, not just for the final timing sign-off. This enables the consideration of timing at every synthesis step and the synthesis tool can optimize the design until the timing constraints are met. With each synthesis step, the available information is getting more accurate which in turn increases the accuracy of the TA. Only the multi-level logic functions are known at the logic synthesis stage and the circuit delay can only be roughly estimated by the logic depths of those functions. During technology mapping it is first known which gates from the standard cell library are instantiated. From this step on aging can be considered by an aging-aware gate model. The exact net length is available during the place and route synthesis stage, which 3 Intel’s Six-Core Core i7 CPU from 2010 has 1.17 · 109 transistors. 12 1.3. Structure of the thesis Figure 1.2.: Aging-aware timing analysis of a circuit. Aging effects degrade transistor parameter, which results in increased gate delays over time. The critical path delay increases as well and the timing specification might be violated during the specified lifetime. increases the accuracy of the TA by knowing the parasitic capacitance and resistance of the nets. Finally, the coupling capacitances are available for timing sign-off, which again increases the accuracy of the TA. Hence, an aging-aware TA is beneficial at all synthesis steps from technology mapping on. 1.3. Structure of the thesis Chapter 2 discusses the fundamentals of TA and the state of the art of aging-aware timing analysis. Chapter 3 introduces the two dominant drift-related aging effects, NBTI and hot carrier injection (HCI). Their physical mechanisms are explained and it is shown how the device parameter degradation can be modeled. Furthermore their impact on the gate performance is investigated. An aging-aware timing analysis flow is described in Chapter 4. Its basis is an aging-aware gate model called AgeGate. Accuracy benefits of the proposed approach are demonstrated on benchmark circuits. The degradation of a circuit strongly depends on the operating conditions and the workload. Chapter 5 shows methods to identify the paths of a circuit that might become critical without knowing the exact operating conditions and workload. Two applications are presented which use this information: An aging-aware timing model for modules and a methodology to design better-than-worst-case circuits by monitoring all possible critical paths and interfering if one of them degrades too much. Finally, the thesis is summarized in Chapter 6. 13 2. Fundamentals 2.1. (Static) timing analysis Timing analysis is required for many different steps during the design process of a digital circuit. The most obvious task for a TA is to determine the maximum clock frequency a circuit can operate at. Therefore, a TA as accurate as possible is needed for timing sign-off at the end of the digital design flow. A TA is also needed for circuit optimization. During synthesis and layout (placement as well as routing), timing analysis is performed in the inner optimization loop. This requires a timing analyzer that responds to several thousand timing queries as fast as possible (see incremental timing analysis in Section 2.1.3). When local optimizations are performed on a design (e.g., buffer insertion [Alpert et al., 1999]), the TA checks that no timing constraint is violated due to a local modification. The timing analysis of complex digital circuits with up to millions of gates is performed on gate level (or even higher abstraction levels), since a SPICE simulation on circuit level of such large circuits is too time consuming. The required input vectors for a SPICE or logic simulation are another problem. It is not practical to simulate a circuit for all possible input vectors. Nevertheless, a SPICE simulation is more accurate and can be used to verify the paths with the longest delays that are determined by a static timing analysis (STA). A STA has two main advantages compared to a timing simulation on circuit level. It is significantly faster, since a simplified gate model (see Section 2.1.1) and a simplified interconnect model are used. Furthermore, no input vectors are needed, because the logic function of a gate is not considered for the signal propagation. Instead, the propagation of signal arrival times just depends on the circuit topology. Bellido et al. [2006, Chapter 2] compare state-of-the-art gate models to a SPICE simulation. The average speed-up is three orders of magnitude and the mean error is 6.75 %. A STA tool can operate in an early and a late mode. In late mode, the latest arrival times of a signal are determined. In early mode, on the other hand, the earliest time a signal transition can take place at a node is obtained. The circuit delay calculation and the verification of the setup time constraints (see Section 2.1.4) are performed in late mode. The hold time constraints are checked in early mode. 2.1.1. Gate models For STA a gate model is needed to compute the gate delays. The gate model provides a delay for a falling and a rising input transition for each of its timing arcs. A timing arc is defined from a gate input to a gate output (see Figure 2.1(a)). Typically, it is 15 2. Fundamentals A B "1" Z CL (a) NAND gate with a transition at input A. Timing arcs are depicted as lines. (b) Corresponding forms wave- assumed that the output transition is caused by the switching of just one input signal (single input switching assumption). A simultaneous transition at two or more inputs can significantly increase the gate delay. Hence, gate models that take simultaneous input switching into account are more accurate [Chen et al., 2001]. To obtain the gate delays, the gates of a standard cell library are pre-characterized by SPICE simulations. Those simulations are used to create a gate model. During the STA, just the gate model is evaluated. This is the reason, a STA is much faster than performing a SPICE simulation for the entire circuit. There are several techniques to model the gate delay. One of the first was to use the following equation [Sapatnekar, 2004, chap. 4]: d = k1 · CL + k2 (2.1) The gate delay is split into two parts. The dependence of the gate delay on the output load (CL ) is given by k1 and the intrinsic gate delay is given by k2 . CL is given by the input capacitance of succeeding gates and the interconnect capacitance. This quite simple model neglects the impact of the input slope (sIN ) on the gate delay. To consider the impact of the slope, signals are modeled as ramps for STA (see Figure 2.1(b)). A signal is defined by two values: the arrival time (AT) and the corresponding slope. The slope (s) is given by the transition time. This is the time a signal takes to change from logic “0” to logic “1”. Hence, bounds for the logic values have to be defined (e.g., 50 % of VDD for signal crossing and 20 % and 80 % of VDD for transition time). A commonly used gate model is based on a look-up table (LUT). The industry quasistandard, the liberty file format from Synopsys, is such a LUT-based gate model. It stores the gate delays in 2-dimensional LUTs dependent on input slope and output load (see Figure 2.1): d = f (sIN , CL ) (2.2) Values in between the stored values of the LUTs are obtained by interpolation. The input slope is now required in addition to the output load in order to compute the gate delay. For this reason, the output slope (sOU T ) is stored dependent on sIN and CL in LUTs as well. Now, the input slope of a gate can be calculated based on the output slope of its predecessor gate. An advantage of LUT-based gate models is that their accuracy can easily be increased by characterizing the gate at additional supporting points. 16 2.1. (Static) timing analysis Figure 2.1.: LUT-based gate model Due to the ongoing miniaturization, the input capacitance of the gates decreases and the resistance of the interconnect network increases. This leads to an increased inaccuracy when purely capacitive loads are assumed. Due to this, an effective capacitance was introduced by Qian et al. [1994]. The effective capacitance represents the complex interconnect network by a single value. This enabled the continued usage of the existing models. However, the signal waveform in advanced technologies differs significantly from a simple ramp (signals have a long “tail” now), which leads to inaccuracies as well. This is the reason why current source models (CSMs) are developed. The goal of CSMs is to model the signal waveform more accurately by modeling gates as voltage controlled current sources which charge the complex interconnect network and the fan-out gates. Several approaches have been published. The composite current source model (CCSM) [Synopsys, 2006] stores time-current waveforms in the LUTs. The effective current source model (ECSM) [Cadence, 2007] differs only slightly from the CCSM by storing timevoltage waveforms, which are again converted to current waveforms and applied to the interconnect network. CCSM and ECSM have the advantage that they are compatible to the existing timing analysis tools and were adopted quite fast by the industry. Another CSM approach by Croix and Wong [2003] is to store the static output current depending on gate input voltage and gate output voltage in LUTs. By solving differential equations the voltage waveform at the succeeding gate input can be computed. The aging-aware gate model introduced in Chapter 4 is LUT-based. However, Knoth et al. [2011] show that the approach can be combined with a CSM [Knoth et al., 2010] to an aging-aware CSM. 2.1.2. Timing graph A timing graph (TG) is used in STA tools to represent a combinational circuit. A TG is a directed acyclic graph (DAG): T G = (N, E). The nodes N of a timing graph are the gate in- and outputs. These are connected by two types of edges E. The weights of edges connecting gate inputs with gate outputs are the gate delays for the corresponding timing arc. Edges between gate outputs and inputs of succeeding gates represent the delays caused by the interconnect network. The focus of this thesis is on aging effects causing a drift of transistor parameters. Hence, the passive interconnect network is not affected and not considered in the course of this thesis. This enables us to simplify the timing graph. The nets of the gate level netlist can be taken as nodes N and the weighted edges E correspond to gate delays. 17 2. Fundamentals 1 7 2 S 3 10 6 8 4 T 11 9 5 (a) Gate level netlist for ISCAS’85 circuit c17 (b) Simplified timing graph for c17 (for every net just one node is added and not two, as it is described in the text) Figure 2.2.: Circuit and corresponding timing graph The gate model provides a delay for a rising and a falling input transition. Hence, every TG edge has two edge weights. To be able to use unmodified standard graph algorithms, this should be avoided. A very clean and elegant way is described by Ju and Saleh [1991]: For every net two nodes are added to the timing graph, one for a rising transition, and another one for a falling transition. If two nets, u and v, are connected by an inverting gate, the node u for a rising (falling) transition is connected to the node v for a falling (rising) transition. If it is a non-inverting gate, the node u for a rising (falling) transition is connected to v for a rising (falling) transition. That way every edge in the timing graph has just one edge weight. Two additional nodes are added to the TG. A source node node (S) connected to all primary input (PI) nodes; and all primary output (PO) nodes are connected to a sink node (T ) (see Figure 2.2). To model unequal arrival times at the primary inputs, delays can be assigned to the edges from S to the PIs. 2.1.3. Incremental timing analysis When the TG is annotated with gate delays as edge weights, the circuit delay can be determined. The circuit delay is defined by the path (P ) with the longest path delay (D(P )). This path is called critical path (Pcrit ), its path delay is the critical path delay (D(Pcrit ) or just Dcrit ). The circuit delay can be determined by path-based or block-based methods. The path-based method enumerates all paths in the TG and computes their path delays by adding up the gate delays along the path. The critical path with the longest path delay determines the circuit delay. The path-based method has an exponential worst-case time-complexity because the number of paths in a circuit increases (in the worst case) exponentially with the number of nodes. The block-based method propagates the arrival times (ATs) through the circuit, starting at S until T is reached. For a given node n, AT(n) is the maximal point in time 18 2.1. (Static) timing analysis Figure 2.3.: Computation of the arrival time (AT). that the signal at n can change1 . The arrival time of a node n can be calculated when the arrival times of all predecessor nodes i and the gate delays d of all incoming edges are known (see Figure 2.3): AT(n) = max i∈predecessors(n) AT(i) + d((i, n)) (2.3) AT(T ) corresponds to the circuit delay. In contrast to the path-based method, each node is just visited once, hence, the time complexity is O(|N |). Hence, the difference between the block-based and the path-based method is that the former calculates maximal arrival times for each node whereas the latter computes all path delays first and then calculates the maximum out of them. Both methods add up the gate delays without considering the logic function of the gate. Hence, the critical path may not be sensitizable. A path is not sensitizable if there doesn’t exist an input assignment that enables a signal to propagate along the path (see Section 5.3.3). A path that is not sensitizable is called false path. If the critical path is a false path, then the circuit delay is overestimated. The path-based method can easily recognize a false path by checking every path whether it is sensitizable. For the block-based method this is more difficult, since one cannot easily determine the path with the next longest path delay if the critical path is a false path. An efficient method to enumerate the paths with respect to the path delay is discussed in Section 2.1.5. When the static timing analyzer is used in the inner optimization loop, the design is often modified only slightly before the timing must be reevaluated. It would be very inefficient to analyze the complete design again in this case. The incremental timing analysis instead just analyzes the part of the timing graph that is affected by the change. The foundation of an incremental timing analysis is that every timing quantity (e.g., arrival time or gate delay) has a valid flag (e.g., ATvalid or dvalid ). It is crucial that whenever the circuit and therefore the timing graph changes the valid flags of timing quantities that are affected are reset. This is done by two recursive functions reset_node and reset_edge. In reset_edge the controlling node of the arrival time (ATctrl ) is needed. The controlling node is the predecessor node that defines the arrival time (i.e., the node i in Equation 2.3 that is responsible for the maximal arrival time at n) 1 or minimal time a signal changes if hold time constraints should be checked 19 2. Fundamentals Function reset_node(node) /* Function to set the arrival time of a node to invalid */ ATvalid (node) ← F alse; foreach successor suc of node do /* Delay of outgoing edges are invalid because edge input slope is invalid */ reset_edge(node, suc); end Function reset_edge(u,v) /* Recursive function to set the delay of an edge (u, v) to invalid */ dvalid ((u, v)) ← F alse; if ATctrl (v) == u then /* Arrival time at node v is invalid because it was controlled by edge (u, v) */ reset_node(v); end Whenever a timing quantity is read, first, it has to be checked whether it is still valid. If not, then it must be recalculated. This is done by two recursive functions, update_node and update_edge. Let’s assume the circuit delay should be reevaluated after a design change. First, it is checked if the arrival time at T is still valid. If this is the case, then the change did not affect the circuit delay. Otherwise, one has to proceed backwards into the timing graph starting at T until one reaches valid arrival times and gate delays and recalculate AT(T ) based on those values. The algorithm to calculate the circuit delay for an incremental timing analyzer is given in Algorithm 1. As an initialization step the arrival time of the source node, which is equal to 0, must be set to valid. Then, the arrival time at the sink node is queried. The propagation of the arrival time from source node to sink node is done behind the scenes by update_node and update_edge. Algorithm 1: Circuit delay computation /* Set arrival time at source node to valid ATvalid (S) ← T rue; /* Update arrival time at the sink node update_node(T ); */ */ Figure 2.4 shows an example for the incremental timing analysis. Due to a design change the arrival time at node 6 is invalid, resulting in the other nodes marked red (or dark gray) also being invalid. Now the circuit delay is reevaluated by calling update_node(T ). This results in recursively calling update_node for all invalid nodes 20 2.1. (Static) timing analysis Function update_node(node) /* Recursive function to update the arrival time of a node if ATvalid (node) == T rue then return AT(node) else */ AT(node) ← maxi∈predecessors(n) update_node(i) + update_edge((i,node)) end Function update_edge(u,v) /* Recursive function to update the gate delay of an edge (u, v) if dvalid ((u, v)) == F alse then /* Update gate delay based on input slope and output load slope = get_slope_from_node(u); load = get_load_from_node(v); d((u, v)) = get_delay_from_LUT(slope, load); dvalid ((u, v)) = T rue end return d((u, v)) */ */ down to node 6. The methods to identify possible critical paths in an aged circuit, discussed in Chapter 5, continuously modify the TG by removing nodes and edges. Hence, without an incremental TA, the STA would have to be performed whenever the TG is modified. There are several other timing quantities of interest. AT gives the maximal time a signal takes from the source node to a given node. Delay to sink (D2S), on the other hand, defines the maximal time a signal takes from a given node until it reaches the sink node. D2S is calculated as follows: D2S(n) = max i∈successors(n) D2S(i) + d((n, i)) (2.4) To calculate D2S for all nodes, one starts at T and computes D2S for the predecessor nodes until S is reached. The required time (REQT(n)) is the time a signal must be at a node n such that it arrives at T in time. Therefore, REQT at T must be specified first. REQT at a node n is the difference between REQT(T ) and the D2S at n: REQT(n) = REQT(T ) − D2S(n) (2.5) The difference between required time and arrival time is called slack (SLACK): SLACK(n) = REQT(n) − AT(n) (2.6) 21 2. Fundamentals 1 7 2 S 3 10 6 8 4 T 11 9 5 Figure 2.4.: Example of the incremental timing algorithm. Arrival time at red (dark grey) nodes is not valid. To update arrival time at node T, all invalid arrival times are recursively updated (dashed arrows). A negative slack implies that the signal arrives at a node after it has to in order to fulfill the required time at the sink node. The slack of a node is an important information for circuit optimization. 2.1.4. Sequential circuits In contrast to a combinational circuit, a sequential circuit has storage elements in addition to logic gates. Hence, the output of a sequential circuit does not only depend on the input signals but on the internal state as well. For synchronous sequential circuits, the output signals of the combinational logic, which are fed back into the combinational logic, are synchronized by a clock signal (see Figure 2.5). Due to its simplicity, regarding design and verification, the common storage element used in synchronous sequential circuits is the flip-flop (FF). FFs capture the data signal at the active clock edge (in Figure 2.5 the rising transition is the active clock edge). Synchronous sequential circuits can be used to realize finite state machines. They can also be used to split complex combinational circuits into several parts. That way the performance of the circuit can be increased, since just the circuit parts must fulfill the timing constraints. This is called pipelining and is used, for instance, in microprocessors. To store a date correctly into a flip-flop, the following two timing constraints have to be fulfilled (see waveform in Figure 2.5): • setup time (tSU P ) is the time interval the data signal has to be stable before the active clock edge to sample the date correctly. This can be verified during STA by the following inequality: dCLK−to−Q + Dmax + tSU P < tCLK (2.7) The clock-to-Q delay (dCLK−to−Q ) is the delay from an active clock edge until the output of the sending FF changes. Dmax is the maximal delay of the combinational circuit to the receiving FF input. 22 2.1. (Static) timing analysis PI combinatorial logic PO D Q Clk TSUP THLD Clk D Figure 2.5.: Diagram of a sequential logic circuit. The timing constraints (setup and hold time) of a flip-flop are given as well. • hold time (tHLD ) is the time interval that the data signal has to remain stable after the active clock edge to sample the date correctly. This can be checked by the following inequality: dCLK−to−Q + Dmin > tHLD (2.8) Dmin is the minimal circuit delay to the receiving FF input. Dmin is obtained by the STA tool in the early mode. The STA algorithm must be modified slightly to analyze sequential circuits. The flipflops are removed from the netlist. Every signal connected to a FF input becomes a PO and every signal connected to a FF output becomes a PI. The remaining circuit is now purely combinational and the TG can be set up. The timing constraints for the flip-flops are considered by weights of edges to the sink node and from the source node. Edge weights from S to former FF outputs are set to dCLK−to−Q . To check the setup time constraints, the edge weights from former FF inputs to T are set to tSU P . If the maximal arrival time at the sink node is less than tCLK , then all setup time constraints are met. To check the hold time constraints, the edge weights from former FF inputs to T are set to tHLD . Now, if the minimal arrival time at the sink node is greater than tCLK , then all hold time constraints are met. The minimal arrival time at a node is calculated by simply exchanging the max-operation in Equation 2.3 with the min-operation. 2.1.5. Path enumeration When a block-based STA is performed, the circuit delay is given by the arrival time at the sink node. The corresponding critical path can be obtained efficiently, because the 23 2. Fundamentals Figure 2.6.: An example for calculating the branch slacks. controlling nodes are stored for the delays to sink. The controlling node of a node n is the successor node which is responsible for the maximal D2S at n. By following the path from a node to its controlling node starting at S, the critical path is determined. However, often not only the critical path itself is of interest, but also those paths with the next longest path delays. These paths are required, for instance, to simulate their delay again on circuit level. This problem is referred to as k most critical paths problem. Determining the next longest paths is not as easy as determining Pcrit in a block-based STA approach. Ju and Saleh [1991] propose an efficient way to compute the k most critical paths. One advantage of their algorithm is that k does not have to be specified in advance, but the path enumeration can be suspended and continued as required. The key idea of the algorithm is the introduction of branch slacks (BSs). In an initialization phase, the BSs are calculated for every edge in the TG. Therefore, the successor nodes vi of a node u are sorted according to the following cost function fcost : fcost (u, vi ) = d((u, vi )) + D2S(vi ) (2.9) This is the maximal delay from node u to T over the edge (u, vi ). The branch slack is now the difference between the cost function of two nodes vi and vi+1 next to each other in the sorted successor list of u: BS(u, vi ) = fcost (u, vi ) − fcost (u, vi+1 ) (2.10) The branch slack of an edge (u, vi ) tells us that the path with the next longest path delay, which branches out from node u, goes over edge (u, vi+1 ) and its path delay is BS(u, vi ) shorter. Figure 2.6 shows the calculation of the branch slacks. In the path enumeration phase, the next longest paths are determined by means of the branch slacks. First, Pcrit is determined as discussed before. The path with the next longest path delay branches out of Pcrit at the edge (u, vi ) with the smallest branch slack. This path can be determined by branching off at u to vi+1 and following the controlling nodes of vi+1 recursively until the sink node is reached. Additional paths can be computed as follows. The path Pk+1 with the next longest path delay should be determined. Pk+1 can be generated by branching out at a branch 24 2.1. (Static) timing analysis point from one of the k already determined paths. Therefore, a data structure list[i] is required, which keeps a list of branch points for every path Pi that is already determined. This list is sorted according to the branch slacks. Hence, the branch point resulting in the path with the next longer path delay which branches out from Pi comes first in the list. The data structure next_delay is another sorted list, which contains the delay of the next longest path branching out from every already determined path Pi . The next longest path delay for Pi can be calculated as follows: next_delay(Pi ) = D(P i ) − BS of the first element in list[i] (2.11) When the next longest path should be determined one takes the first path from next_delay and looks in list[i] for the first branch point for this path (see Algorithm 2). In Figure 2.7 an annotated TG with branch slacks and delays to sink is given. Table 2.1 shows the corresponding execution trace of the k most critical path algorithm for the first five iterations. Given are the determined path and its delay, the branch points with corresponding branch slacks and the next longest path delay of a path branching out from this path. The first path is Pcrit with a path delay of 12. Pcrit has two branch points S with BS = 1 and node 6 with BS = 2. The branch points are ordered in non-decreasing order with respect to the branch slack. Hence, next_delay is 11 (= D(Pcrit )−BS((S, 2))) and the corresponding path is branching out from Pcrit at S. To determine the path in the second iteration the path with the largest next_delay is taken. In this case there is only one next_delay, hence, the path in the second iteration is branching out from Pcrit at S. The used branch point is crossed out (indicated by the arrow with the 2 on top standing for the iteration in which it is crossed out). The next_delay = 9 is computed for the second path and a new next_delay for the first path must be calculated as well (indicated by the arrow with the 2 on top). The execution trace shows how the algorithm continues to determine the next three longest paths. Algorithm 2: k most critical paths P1 ← Pcrit ; prepare list[1] and calculate next_delay(P1 ) ; k ← 1; while path enumeration not stopped yet do i ← path with longest next_delay; j ← first branch point in list[i] ; generate the next longest path Pk+1 by branching out from the j-th node on path Pi ; prepare list[k + 1] and calculate next_delay(Pk+1 ) ; remove first element in list[i] and update next_delay(Pi ); k ←k+1 ; end return (P1 , P1 , . . . , Pk ) 25 2. Fundamentals 9 1 4 5 7 S 1 = BS BS= 2 3 4 3 1 = BS 11 4 6 5 5 8 3 BS=2 2 4 6 7 3 BS = 2 12 2 8 5 BS= 4 4 2 0 10 4 BS=2 2 0 T 0 11 2 9 Figure 2.7.: TG with branch slacks (arc between to edges) and delays to sink (number next to the node) path(delay) branch points(branch slack) next_delay 1 S, 2, 6, 7, 10, T (12) *2 11 → 10 2 3 4 5 S, 4, 6, 7, 10, T (11) S, 2, 6, 8, 10, T (10) S, 1, 7, 10, T (9) S, 4, 6, 8, 10, T (9) k *3 6(2) S(1), 4 5 * * 4(5) 6(2), S(2), 8(2) S(2) S(2), 8(2) 2 4 9→9 8 7 7 Table 2.1.: Execution trace of the k most critical paths algorithm for the five slowest paths. 26 2.2. State of the art of aging analysis The algorithm discussed so far is not only capable of enumerating all paths from S to T , it can determine all paths from an arbitrary node to T . In order to enumerate all paths from the source node to an arbitrary node, the algorithm must be slightly changed. Most important is to introduce join slack (JS). Join slacks are quite similar to branch slacks. The join slack is the delay difference between two path segments from S to a given node. In this thesis the k most critical paths algorithm is required in Chapter 5. It is used to consider common edges when the possible critical paths of a circuit are identified and to determine whether a possible critical path of an aged circuit is sensitizable. 2.2. State of the art of aging analysis Several tools have been published that analyze the circuit performance degradation caused by aging effects on circuit level as well as gate level [Liu et al., 2006]. Tools that analyze the degradation caused by drift related aging effects, such as NBTI and HCI, are discussed in the following. There are other tools as well that compute the impact on circuit reliability caused by electromigration (EM) [Blaauw et al., 2003] or radiation-induced soft errors [Miskov-Zivanov and Marculescu, 2008]. 2.2.1. Circuit level The general flow of tools to analyze the performance degradation on circuit level can be divided into the following three steps: 1. The fresh circuit is simulated and the current and voltage waveforms at the transistor terminals, which are relevant for the prediction of the device degradation, are stored. 2. Those waveforms are used to generate degraded device models for each individual device. 3. Finally, the degraded circuit performances are obtained by a second SPICE simulation with aged device models. The first published reliability simulator is called Berkeley reliability tools (BERT) [Tu et al., 1993]. BERT is able to determine the performance degradation caused by HCI. Besides that, BERT can compute the probability that a circuit fails due to time-dependent dielectric breakdown (TDDB) and EM. In the first step, BERT determines the drain current Id (t), the gate current Ig (t) and the substrate current Isub (t). In the second step, from Id (t), Ig (t) and Isub (t) a parameter AGE is determined for every transistor. AGE quantifies the amount of degradation: AGEN M OS = AGEP M OS = Z tlif e 0 Z tlif e 0 mn Id (t) Isub (t) W · Hn Id (t) 1 Ig (t) mp dt Hp W dt (2.12) (2.13) 27 2. Fundamentals H and m are determined experimentally for a given technology. W is the transistor width and tlif e the lifetime. Of course it is not possible to simulate the circuit for the entire lifetime tlif e . Hence, the circuit is simulated for a shorter time interval and AGEN M OS and AGEP M OS are extrapolated. Two methods are implemented in BERT to determine the degraded device models. Either by interpolating between degraded device model cards for a particular AGE or the parameter degradation ∆p of the aged device model card are obtained by functions dependent on AGE: ∆p = f (AGE) (2.14) After generating the degraded device models, the degraded circuit performance can be simulated in the third step. Commercial reliability simulators, like RelXpert [Cadence, 2003], are already available and the latest versions of HSPICE [Synopsys, 2008] and ELDO [Karam et al., 2001] come with an integrated reliability analysis. RelXpert can consider the impact of HCI and NBTI. ELDO is capable of determining the degraded device parameters iteratively. Therefore, the specified lifetime is divided into n time intervals (of equal length). The steps one and two are conducted in every time interval. That way, the impact of the degraded waveforms on the parameter drift can be considered. Maricau and Gielen [2010] analyze the combined impact of aging and process variation on circuit behavior. Like ELDO, it is an iterative approach, but the length of the time intervals is variable. In Section 4.5.1 it is proven by a simple experiment that such an iterative approach is (at least for digital circuits) not necessary. A drawback of commercial tools like RelXpert and ELDO is that the degradation equations are proprietary. Hence, the user has to trust the tool and cannot verify how the degradation is calculated. Kufluoglu et al. [2010] show that RelXpert only reaches an acceptable accuracy when the proprietary degradation equations are replaced by improved user defined equations. Reliability simulators on circuit-level can be very accurate. However, a reliability simulation on circuit-level is quite time consuming and realistic input vectors are required. For the first step of the aging analysis, input vectors are needed that cause a realistic/worst-case degradation of the circuit. The third step requires input vectors to measure the degraded circuit performances. In general, the input vectors in the first and third step are not equal. Like SPICE simulators for timing analysis (see Section 2.1), these tools are not capable of simulating complex digital circuits. Nevertheless, they can be used to verify the critical aged path determined by a aging-aware timing analysis on gate level. 2.2.2. Gate level Aged LUT-based gate models Although reliability simulators on circuit level are not applicable for timing analysis of complex digital circuits, they can be used to characterize aged gate models. 28 2.2. State of the art of aging analysis Figure 2.8.: Aged LUT-based gate model as proposed in [Chen et al., 2011]. Chen et al. [2011] propose a path-based analysis flow, although the gate model can also be used for a block-based approach. HSPICE [Synopsys, 2008] is used to generate several aged LUTs for different conditions like lifetime, temperature or signal probability. This approach results in a lot of LUTs, especially when the workload at the gate inputs should be considered. If, for instance, LUTs should be generated for five different signal probabilities, 5 LUTs would be enough for a gate with one input (see Figure 2.8). A gate with three inputs already needs 125(= 5 · 5 · 5) LUTs and there are gates in a standard cell library that have even more inputs. The aging-aware gate model GLACIER [Wu et al., 2000] considers HCI and defines a factor α as follows: daged α(sIN , CL , T D) = (2.15) df resh The aged gate delay daged and the fresh gate delay df resh have to be simulated. df resh is dependent on input slope sIN and output load CL . daged is also dependent on the transition density T D at the input. For a multiple input gate, daged depends on T D at every input. To reduce the complexity, it is assumed that the gate delay for each input can be calculated by considering the contribution from the switching of all gate inputs separately from one another as follows: α= n X ! αi − (n − 1) (2.16) i=1 Where n is the number of transistors connected in series and αi is the contribution of one input pin i when just this input switches. However, this approach neglects the impact of the workload at the other inputs and of the internal gate structure on the parameter drift (see Section 4.3.3). When a reliability simulator on circuit level is used to characterize a gate library, then the gate models are valid just for one specific use profile. Hence, the gate models are dependent on the use profile. If, for example, the specified life time changes, the entire library has to be re-characterized. 29 2. Fundamentals Figure 2.9.: Gate delay degradation as a linear function of ∆Vth Aged gate delay as a function of parameter drift All other proposed gate models have in common that they just consider NBTI and daged is the sum of df resh and the degradation as a function of the threshold voltage drift ∆d(∆Vth ) caused by NBTI: daged = df resh + ∆d(∆Vth ) (2.17) The advantage of such a gate model is that it is independent of the use profile and the workload, because they only impact the parameter drift and the drift is computed during the analysis and not in advance during the gate model characterization. As long as the parameter drift caused by aging is small enough, a linear approximation for the dependence of ∆d and ∆Vth can be used (see Figure 2.9): daged = df resh + ∂d · ∆p ∂Vth (2.18) Paul et al. [2006] use the α-power law [Sakurai and Newton, 1990] to obtain the ∂d sensitivity ∂V : th Id ∝ (Vgs − Vth )α (2.19) It is assumed that the gate delay is solely determined by recharging the output load (no intrinsic gate delay): CL · VDD const. d= = (2.20) Id (Vgs − Vth )α Differentiating the expression with respect to Vth results in: ∂d α·d = ∂Vth (Vgs − Vth ) (2.21) In contrast to that, Kumar et al. [2006] determine the dependence ∆d(∆p) by simulation and store the results in LUTs. Kumar et al. [2006] also describe how to calculate the threshold voltage drift iteratively based on the reaction diffusion (RD) equations for NBTI (see Section 3.1.1). However, this involves solving an equation for every stress and recovery phase during the lifetime and makes the calculation of the drift very inefficient, especially for long lifetimes. A third contribution is that arbitrary signals result in the 30 2.2. State of the art of aging analysis Figure 2.10.: Transformation of arbitrary signals into periodic signals with same signal probability and transition density. ΔVth long term prediction model time Figure 2.11.: Drawing of an NBTI threshold voltage drift caused by consecutive stress and relaxation phases (thin black line) and the ∆Vth drift given by the long term prediction model (thick orange line). same drift as periodic signals with same signal probability and transition density. Hence, it is not necessary to know the exact waveform of the gate input signals, but it is enough to know their signal probabilities and transition densities (see Figure 2.10). Otherwise, aging analysis would not be feasible, if exact input signals are unknown when a circuit is developed. Wang et al. [2007b] derive a closed form equation to calculate the upper bound of the parameter drift caused by NBTI (see long term prediction model in Figure 2.11). Hence, the drift does not have to be calculated iteratively. It is also shown that NBTI has a negligible impact on the clock distribution network of a sequential circuit. For sequential circuits it is important that the delay of the clock distribution network to the sending and the receiving FFs have the same delay. Only that way it is assured that the signals in the combinational logic have one full clock period to propagate from sending to receiving FFs. Wang et al. [2007b] argue that the clock period is unaffected by aging, because the clock signals to the sending and receiving FFs are delayed equally. However, clock gating is not considered. If the sending and receiving FFs are in separate clock domains, both clock signals can degrade differently. This would have to be considered during the analysis of sequential circuits. The gate model by Luo et al. [2007b] is based on the α-power law as well. It considers different temperatures in active and standby mode. In standby mode the transistors degrade as well, but due to the lower temperature and the exponential dependence of parameter drift on temperature, the parameter drift is much smaller. In Section 4.3.3 it is shown how different temperatures can be considered for the gate model introduced in this thesis. Luo et al. [2007a] introduce a model that takes the stacking effect into account. Stacking effect describes the effect that not all transistors in a transistor stack have VDD as 31 2. Fundamentals their gate source voltage. All gate models so far have in common that they use just one value for ∆Vth , although, in general ∆Vth differs for different transistors of a gate. Either ∆Vth is calculated for every transistor and the maximum is taken or the ∆Vth of the transistor with an input transition is taken. Kumar et al. [2007a] show that the parameter drift of a NOR gate with two inputs depends on the signal probability at both inputs. However, this is just shown exemplarily and there is no formal algorithm derived to calculate the parameter drift of arbitrary logic gates dependent on the signal probabilities at their inputs. Stempkovsky et al. [2009] don’t propose a self-contained aging-aware gate model, but an algorithm to compute the time each individual transistor of a gate is in stress condition. It considers the signal correlation at the gate inputs. The model also takes into account that the supply voltage, which must be applied to the source and drain contacts of a PMOS transistor that it is stressed due to NBTI, can come from the drain or the source contact (see Section 4.4.2). Aging effects are stochastic processes. NBTI, for instance, is caused by breaking Si-H bonds and this happens with a certain probability. This results in a distribution of the threshold voltage drift. Hence, two identical transistors that are stressed identically do not have the same threshold voltage drift. Kang et al. [2007] model the Vth variation of PMOS transistors and investigate its impact on SRAM cells and combinational logic. Lu et al. [2009] propose a statistical reliability analysis which jointly considers the impact of process variation and aging effects. Table 2.2 compares all aging-aware gate models discussed so far and the proposed aging-aware gate model, AgeGate. First optimization methods to minimize the impact of NBTI have been published. This can for instance be done by pin reordering and logic restructuring [Wu and Marculescu, 2009] or by controlling the signals at internal nodes when the circuit is idle [Bild et al., 2009]. 32 b a aged LUT aged LUT α-power law simulated sensitivities closed form expression for parameter drift different temperature in active and standby mode considers stacking effect individual transistor drifts considered jointly considers aging effects and process variation based on canonical gate model [Chen et al., 2011] [Wu et al., 2000] [Paul et al., 2006] [Kumar et al., 2006] 3 3 3 3 3 3 3 7 3 3 NBTI 3 7 7 7 7 7 3 3 7 7 HCI 3 7 3b 7 7 7 3 3a 7 7 Individual transistor drifts neglects impact of the workload at other inputs and of internal gate structure on parameter drift Doesn’t describe formal way to calculate individual transistor drifts AgeGate [Lu et al., 2009] [Kumar et al., 2007a] [Luo et al., 2007a] [Luo et al., 2007b] [Wang et al., 2007b] Description Gate model 3 7 7 7 7 7 7 3 7 7 Aged slope output 3 3 3 3 3 3 7 7 3 3 Use profile independent model Table 2.2.: Comparison of state-of-the-art gate models with the proposed aging-aware gate model AgeGate. 2.2. State of the art of aging analysis 33 3. Aging effects and their impact on standard cells The objective of this thesis are methods to analyze the degradation of complex digital circuits due to aging. But prior to that, the aging effects and their impact on the performance of single gates are investigated. Aging effects can be classified into effects that cause a catastrophic failure of a device and effects that cause a drift of device parameters with time. For the analysis of the circuit degradation the drift-related aging effects have to be taken into account. In addition, the amount of gate performance degradation due to an aging effect and on which factors it depends1 is investigated. This helps to decide which dependencies have to be modeled by the aging-aware gate model that is developed in Chapter 4. To determine the impact of aging effects on the degradation of the gate performance, it is proceeded as follows (see Figure 3.1): The parameter drifts, caused by aging effects, and the sensitivity of a gate performance with respect to a parameter drift are obtained. Combining both information provides the degradation of the gate performance. Finally, it is identified how the degradation due to aging evolves over different process technologies. The parameter drifts due to HCI do not show a consistent trend, but it is shown that the circuits are getting more and more sensitive to a parameter drift because of the reduced supply voltage. 3.1. Aging effects Aging effects change device parameters with time. It can be distinguished between aging effects that lead to an abrupt, catastrophic failure and effects that lead to a device parameter drift. Representatives that lead to a catastrophic failure are TDDB and EM. TDDB can be split up into two phases [Lee et al., 2006]. The first phase is called soft break down (SBD). With time, traps in the gate oxide are generated and these traps eventually form a conducting path through the oxide. Once a conducting path has been established, new traps are generated due to thermal damage. The new traps result in higher currents, the temperature in the oxide is further increased and even more traps are formed. This condition is called thermal runaway and finally leads to a hard break down (HBD) and the transistor suddenly fails. The phenomenon that electrons carry metal atoms along a wire is called electromigration. EM causes shorts or opens in signal wires and especially in supply wires [Strong et al., 2009]. 1 e.g., dependency of the gate delay degradation on temperature and supply voltage 35 3. Aging effects and their impact on standard cells 0.05 0 0.8 1 1.2 1.4 Supply Voltage VDD [V] (a) INV; 90nm; 27°C; 1.2V ∆delay (falling input) [%] |∆Vth| [V] 90nm; 10y; 125°C; W=10µm; Lmin 0.1 30 20 10 0 0 0.02 0.04 0.06 |∆V | [V] 0.08 0.1 th (b) Figure 3.1.: 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate delay degradation to a threshold voltage drift (b). Hence, NBTI causes about 10 % degradation of the output delay for a rising input transition. Aging effects that cause a catastrophic failure have to be treated stochastically by computing a failure rate or a mean time to failure for a circuit. Aging effects that cause a parameter drift, on the other hand, can be treated deterministically. They cause a degradation of the transistor characteristics, which, in turn, leads to a degradation of the gate performance. This is the reason why drift-related aging effects have to be considered for an aging-aware timing analysis. The two dominant effects that cause a parameter drift are negative bias temperature instability (NBTI) and hot carrier injection (HCI). Both effects are described in detail in the following subsections. Unfortunately, the classification of drift-related aging effects and aging effects that cause a catastrophic failure are not as unambiguous as described so far. For the latter, a parameter drift can be observed as well before the catastrophic failure takes place. The resistance of a wire first increases and then an open is generated due to electromigration. For TDDB, conducting paths lead to a gradually increase of the gate current during the SBD phase before the transistor actually fails. If the time interval in which a parameter drift can be observed is short, it is not required that this effect is considered for an aging-aware TA — the device is going to fail anyway within a short period of time. Lee et al. [2006] show that the time between a SBD and a HBD is significant in advanced technologies. A gate model for the SBD phase of TDDB is already proposed in [Choudhury et al., 2010]. The equivalent circuit used to model the impact of SBD on a transistor could also be used to incorporate SBD into the proposed aging-aware gate model discussed in Chapter 4. EM does not affect the gate itself, but the delay of signal lines and the voltage drop across supply lines. Hence, if EM becomes relevant, it must be considered in the wire load model for timing analysis. 36 3.1. Aging effects gate oxide Gate Source Drain O O O Si O Si Si O O Si O H O H H O O H O Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si channel Figure 3.2.: Cross section of a PMOS transistor. 3.1.1. Negative Bias Temperature Instability NBTI is regarded the most severe aging effect nowadays. It is a research topic for the last 40 years [Miura and Matukura, 1966] and gains increased interest in the last decade due to the problems it causes in modern semiconductor technologies [Entner, 2007]. NBTI only affects PMOS transistors. The stress mode for NBTI is a negatively biased gate terminal with respect to source and drain. Hence, the transistor is in inversion. The main impact of NBTI on a PMOS transistor can be modeled by an increase of the absolute value of the threshold voltage. A (normally-off) PMOS transistor has a negative threshold voltage. Due to NBTI the threshold voltage becomes more negative. It could be misleading to say that NBTI decreases the threshold voltage, because a reduction of Vth (for NMOS transistors) implies a performance increase. The convention for this thesis is to say that NBTI increases (the absolute value of) the threshold voltage |Vth |. Like the name negative bias temperature instability implies, NBTI is accelerated by an increased temperature and an increased supply voltage. Physical mechanism of NBTI There is still no consensus yet on the physical mechanism of NBTI. One quite popular theory is the RD model. According to Alam et al. [2007], NBTI originates from broken Si-H bonds at the interface between the substrate and the gate oxide. Figure 3.2 shows a cross section of a transistor. The substrate consists of crystalline silicon (Si). To isolate the gate from the substrate, a layer of silicon dioxide (SiO2 ) is grown upon the substrate. The gate itself consists of polycrystalline silicon. After the SiO2 layer is processed, dangling bonds remain at the Si/SiO2 interface. A dangling bond is a Si atom with an unsatisfied valence. Dangling bonds are called interface states. These states can capture charges and have a significant negative impact on the transistor performance. During the manufacturing of a chip, interface states are satisfied by hydrogen atoms (H). Those Si-H bonds can break up again during the NBTI stress mode. The generated interface states are responsible 37 3. Aging effects and their impact on standard cells for the degradation of the transistor parameters. There are contradictory opinions about what happens with the vacant H atoms. It is still under discussion whether there is a diffusion of neutral H atoms, a diffusion of H2 molecules, or a drift of H+ ions in the direction of the gate. Alam et al. [2007] argue that H atoms react to H2 and H2 then diffuses. The generation of the interface states and the diffusion of the hydrogen can be modeled by a RD system. In a RD system two processes are involved: A local reaction and a diffusion (or drift) of the reaction products. The rate of interface state generation due to NBTI is given by the following equation: dNit = kF (N0 − Nit ) − kR NH (0)Nit {z } | {z } | dt generation (3.1) annealing N0 is the initial number of Si-H bonds, Nit is the number of interface states and kF is the rate constant of broken bond creation (dissociation rate constant). NH (0) is the number of hydrogen atoms at the Si/SiO2 interface. The process of Si-H bond breaking can also be reversed. This is described by the second term. kR is the rate constant of reverse annealing of a dangling bond and a H atom to a Si-H bond. This annealing or recovery effects is a special property of NBTI. It means that the number of interface states decreases again when the stress is removed. The creation of interface states is limited by the diffusion (or drift) of hydrogen. This is modeled by a second rate equation: dNit dNH = −DH + NH · µH · Eox dt dx (3.2) DH is the diffusion coefficient, µH is the mobility and Eox the electrical field across the oxide. The second term can be neglected for neutral atoms or molecules. kF , kR and DH are temperature dependent. kF depends on the electrical field as well. This means that for the generation of interface states an electrical field is required but not for the annealing and the diffusion. Equations 3.1 and 3.2 form a system of partial differential equations. This system can either be solved numerically or a closed form equation can be derived if some justified assumptions are made: s Nit (t) = kF N0 (DH t)1/4 2kR (3.3) The assumptions are that the rate of interface states is small and Nit is much smaller than N0 . The time dependence for H diffusion is 1/4 and for H2 diffusion it is 1/6. The dependence of Nit on Vth is given by [Schroder and Babcock, 2003]: qNit (ΦS ) (3.4) Cox Cox is the oxide capacitance and ΦS is the surface potential. By increasing Nit the absolute value of Vth is increased. Other device parameters are also going to change due to Vth : Vth ∝ − 38 3.1. Aging effects −4 Id [A] 0.5 x 10 |∆ Vth|=0mV 0 |∆ Vth|=33mv −0.5 |∆ Vth|=66mV −1 |∆ Vth|=100mV −1.5 −2 Degradation −2.5 −3 −1.4 −1.2 −1 −0.8 −0.6 Vds [V] −0.4 −0.2 0 Figure 3.3.: Output characteristic of a PMOS transistor for altered values of ∆Vth . Id ∝ (Vgs − Vth )2 (3.5) gm ∝ (Vgs − Vth ) (3.6) The drain current Id is important for the performance of digital circuits and the transconductance gm is relevant for analog circuits. Figure 3.3 shows the output characteristic of a PMOS transistor for altered values of ∆Vth . Unfortunately, the reaction diffusion theory is not able to explain all properties of NBTI. The RD theory cannot model the temporal behavior of the recovery effect, the bias dependence of the recovery effect, and the dependency of the parameter drift on the duty cycle of the signal at the gate terminal [Grasser et al., 2009]. One attempt to explain this is by extending the RD model by a second component [Islam et al., 2007]. Besides the creation of interface states, hole trapping might be responsible for the threshold voltage drift as well. The holes are trapped by already existing traps in the oxide. Another explanation is a two-stage model based on E’ centers [Grasser et al., 2009]. E’ centers are a well known defect in SiO2 oxides. In the first stage the E’ centers are charged and discharged. This explains the recovery effect. In the second stage a dangling bond can be created at the Si/SiO2 interface by a positively charged E’ center. Modeling of NBTI To compute the threshold voltage drift for NBTI, degradation equations from an industry partner are used: ∆Vth = A · exp Ea kB · T · Vgs b · tstress n · 1+C W (3.7) The drift is dependent on temperature T , the gate-source voltage Vgs , the time tstress the transistor is in NBTI stress mode and the transistor width W . A, Ea , kB , b, n and 39 3. Aging effects and their impact on standard cells 90nm; Vnom; 125°C; W=10µm; Lmin; 2 ∆Vt [mV] 10 1 10 0 10 1 10 lifetime [y] Figure 3.4.: Time dependence of Vth drift due to NBTI. 90nm; 10y; SP=0%; W=10µm; Lmin 0.08 |∆Vth| [V] 0.06 1.08V 1.2V 1.32V 0.04 0.02 0 0 50 100 150 T [°C] Figure 3.5.: Temperature dependence of ∆Vth for altered values of Vgs . C are constants. The time dependence (n) is shown in Figure 3.4. Reported values for n in the literature are between 0.15 and 0.30 [Massey, 2004]. This could be a clue for H as well as for H2 diffusion. ∆Vth increases monotonically with time (without taking recovery into account). For an aging-aware timing analysis, this means that it is enough to verify that a circuit is fast enough at the end of the specified lifetime. Due to the power law, the drift increases very fast at the beginning and settles with time. Suppose n is 0.25. If you have a certain threshold voltage drift after a time t1 , it takes 16 · t1 to have a threshold voltage drift twice as high. The temperature dependence (see Figure 3.5) is modeled by the Arrhenius equation. The reported values for the activation energy Ea vary between 0.1 and 0.36 eV [Massey, 2004]. The voltage dependence is given by a power law. The higher the gate-source voltage is, the higher is the electrical field across the gate oxide and the resulting drift. For the drift, the temperature and voltage over the lifetime are important. From now on, they are referred to as effective temperature (Tef f ) and effective supply voltage (Vef f ), to distinguish them from the current temperature Tcurr and voltage Vcurr at the moment the circuit is analyzed. The current values of temperature and voltage define the sensitivities, as can be seen later in Section 3.2.1. 40 3.1. Aging effects Vnom; 125°C; 10y; SP=0%(wc); Lmin 0.12 120nm 90nm 65nm LP 65nm HP min. width in cell library 6Vth [V] 0.1 0.08 0.06 0.04 0.02 0 0.5 1 1.5 2 Width [µm] 2.5 3 Figure 3.6.: Transistor width dependence. Marked is the minimal transistor width used in the standard cell libraries. Just a vertical electrical field and no lateral field exists during the homogeneous stress mode for NBTI. The creation of interface states is uniformly distributed over the whole gate area and a dependence on transistor sizes should not be observable. A dependence on transistor length for very short transistors is reported in literature [Massey, 2004], but not modeled in the degradation equations. However, a transistor width dependence for small transistors is modeled by the degradation equations. Some kind of edge effects are assumed to be responsible for the dependence on transistor sizes. Figure 3.6 shows the transistor width dependence for different technologies. Marked are the minimal transistor widths used in the standard cell libraries. One can see that for some technologies (65 nm LP) the transistor width actually affects the drift and for other technologies (120 nm, 90 nm) the minimal transistor width used in the standard cell library is too large to have a significant effect on transistor drift. NBTI strongly depends on the process technology as well. Manufacturing steps that have an impact on NBTI drift are, for instance, concentration of hydrogen, deuterium and nitrogen in the oxide, the gate material, and initial quality of the Si/SiO2 interface [Schroder and Babcock, 2003]. NBTI is a statistical process [Schlünder et al., 2011]. A Si-H bond is broken with a certain probability. Hence, the threshold voltage drift for defined stress parameters is a probability distribution. However, the degradation equations just provide the mean value for the drift. Rauch III [2002] shows that the sigma of the threshold voltage drift is dependent on the transistor area: σ(∆Vth ) ∝ √ 1 W ·L (3.8) It is also shown that ∆Vth due to aging and ∆Vth due to process variation are uncorrelated [Fischer et al., 2008]. 41 ΔVth 3. Aging effects and their impact on standard cells time Figure 3.7.: Drift over time for an AC stress. NBTI2 is the only aging effect that shows a recovery effect. In the RD model, recovery can be explained by the second term in Equation 3.1. This term describes the reverse annealing of Si-H bonds. There is no consensus about whether the complete drift recovers or a permanent part remains [Massey, 2004]. What has been understood is that the recovery of a certain amount of drift takes substantially longer than the time needed to generate this drift. In [Grasser et al., 2009] a proportion of recovery to degradation of 2.5/1 in logarithmic timescale is reported. This means, for instance, when a threshold voltage drift is generated with 25 mV/decade the recovery has a slope of 10 mV/decade. The recovery effect makes it more difficult to characterize NBTI and complicates the analysis of a circuit as well. To extract the constants for the degradation equation, single transistors are stressed under defined conditions and the resulting drifts are measured. Before the drift can be measured, the stress has to be removed. Reisinger et al. [2007] argue that a conventional measurement set up takes up to 1 s to obtain the threshold voltage drift. Hence, the transistor has 1 s to recover before the drift is measured. Reisinger’s proposed on-the-fly measurement just takes 1 µs and it is shown that the drift already recovered 50 % of its value in the interval between 1 µs and 1 s. How much of the drift is recovered before 1 µs is unknown. 1 µs seems already sufficient fast, but in a circuit that is operated with 1 GHz the recovery time might just be 1 ps. Hence, the error between the real drift value and the measured, already recovered value might be larger than 50 %. The degradation due to NBTI is frequency independent, but it strongly depends on the duty cycle of the signal at the transistor gate. NBTI is a static aging effect. The drift is determined by the portion of the lifetime the gate voltage is negative with respect to source and drain and not by the number of signal transitions (frequency). Although the degradation is frequency independent, a substantial difference between a DC and an AC stress is observed [Massey, 2004]. This is due to the recovery effect. For a DC stress the drift cannot recover, it will monotonically increase. For an AC stress, the drift can recover in between the stress phases. This results in a tooth saw curve for the drift over time as depicted in Figure 3.7. Due to the fact that the drift builds up faster than it recovers, the mean of the drift increases monotonically. Figure 3.8(a) shows the dependence of the drift on the stress-duty-cycle as modeled by the degradation equations. For a stress-duty-cycle of 100 %, the transistor is constantly stressed (DC stress) and the drift is maximal. For a stress-duty-cycle of 0 %, the 2 except from its counterpart positive bias temperature instability (PBTI) 42 3.1. Aging effects 100 0.02 50 0 0 20 40 60 Stress duty cycle [%] 80 (a) ∆Vth [%] |∆Vth| [V] 90nm; Vnom; 125°C; 10y; W=10µm; Lmin 0.04 0 100 (b) [Baumann et al., 2010] Figure 3.8.: Duty cycle dependence of NBTI. transistor is never in stress mode and there is no drift observable. Unfortunately, the degradation equations used in this thesis do not take the recovery effect into account. Figure 3.8(b) shows a measured curve of the stress-duty-cycle dependence with recovery for a 40 nm technology [Baumann et al., 2010]. This curve has a S-shape and the drift values for AC stress (stress-duty-cycle < 100 %) are far below the drift for DC stress. Not being able to consider the recovery influences the accuracy of the proposed aging analysis results3 . However, due to the fact that the recovery effect has an impact on the characterization as well, it is not for sure whether the results are too pessimistic or optimistic. On the one hand, recovery is not taken into account for the dependency on the stress-duty-cycle. If it is assumed, for instance, that a transistor experiences a stress-duty-cycle of 50 %, the degradation equations that are used provide a drift of about 80 % of the maximal drift. By considering recovery, the drift would just be about 40 % of the maximal drift (16 mV/42 mV from Figure 3.8(b)). Hence, the error of the analysis would be 50 %. However, it must be considered as well that recovery makes the measurement of the drift more difficult. The drift values to extract the parameters for the degradation equations were not determined by the on-the-fly measurement set-up from [Reisinger et al., 2007]. This results in an error of at least 50 % as well. In this case both errors would cancel each other out. The measurement underestimates the actual drift by 50 %, because the drift has already recovered from its initial value until the measurement starts, and the analysis overestimates the drift by 50 %, because recovery is not taken into account for the stress-duty-cycle dependence. 3 if the workload is taken into account 43 3. Aging effects and their impact on standard cells Positive bias temperature instability NBTI occurs only for PMOS transistors. A similar aging effect for NMOS transistors is called PBTI. The stress condition for PBTI is that the NMOS transistor is in inversion. Hence, the gate terminal is positively biased with respect to source and drain. Before high-k metal gates were introduced, PBTI could be neglected. Since then, degradation due to PBTI is reported to be in the same order of magnitude than NBTI ([Tschanz et al., 2009]). The developed aging analysis is based on a 90 nm technology with SiO2 as gate dielectric. Hence, PBTI can be neglected. Nevertheless, there are no fundamental problems to consider PBTI as well by the proposed aging analysis methodology. 3.1.2. Hot Carrier Injection Hot carrier injection (HCI) affects both, NMOS and PMOS transistors. Carriers are accelerated until they have enough energy to overcome the potential barrier of the Si/SiO2 interface and leave the channel. A small number of those hot carriers damage the gate oxide and the interface or get trapped into the oxide and form space charges. Both mechanisms lead to a degradation of the transistor characteristics. The rest of the carriers contributes to the gate current. Hot carriers are holes or electrons that gained a high kinetic energy by an electrical field. By secondary effects (e.g., electron-electron scattering) their energy can be further increased [Strong et al., 2009]. They are called “hot” because their energy is substantially higher than their energy in thermal equilibrium. The carriers are accelerated by the drain-source voltage Vds across the inverted channel. In the drain region the carriers have collected enough energy to overcome the potential barrier of the Si/SiO2 interface. Hence, HCI is an asymmetric aging effect that damages the drain region of a transistor. Physical mechanism of HCI Four different mechanisms for hot carrier generation and injection can be distinguished [Renesas, 2008]: • Drain avalanche hot carrier (DAHC) • Channel hot carrier (CHC) • Secondary generated hot carrier (SGHC) • Substrate hot carrier (SHC) DAHC and CHC are the two major mechanisms and are further discussed. Drain avalanche hot carrier High energy carriers collide with Si atoms and generate electron hole pairs by impact ionization (see Figure 3.9). Those generated carriers are themselves accelerated and can 44 3.1. Aging effects Vg Vs Vd Gate Ig Id Source Drain Figure 3.9.: Drain avalanche hot carrier. Vg Vs Vd Gate Source Ig Id Drain Figure 3.10.: Channel hot carrier. again cause impact ionization (avalanche multiplication). Some generated carriers are injected into the oxide or damage the interface. DAHC is maximal for Vds = 2 · Vgs . Channel hot carrier This time, impact ionization is not the reason for carrier injection. For CHC (see Figure 3.10), the hot carriers themselves are injected into the oxide. They are accelerated in the direction of the gate by a high gate voltage. Some “lucky electrons” are able to overcome the potential barrier at the Si/SiO2 interface and enter the oxide. CHC is maximal for Vds = Vgs . Modeling of HCI HCI damage can be modeled by an increase of the absolute value of the threshold voltage Vth and an decrease of the mobility µ0 [Strong et al., 2009]. The degradation equations used in this thesis provide a reduction of the drain saturation current Ion in terms of percentage: ∆Ion = Ea Ion,f resh − Ion,aged = Ae kB ·T · Vds b · tstress n · L−m Ion,f resh (3.9) ∆Ion depends on Tef f , the effective drain-source voltage (Vds ), the stress time (tstress ) and the transistor length (L). Figure 3.11 shows the supply voltage, temperature, and 45 3. Aging effects and their impact on standard cells ∆ ION [%] ∆ ION [%] 8 6 4 2 0 1 ∆ ION [%] 90nm; 10y; 1.32V; 25°C, DF=100; W=10µm; Lmin 1 10 10 PMOS NMOS 8 10 6 4 0 10 2 1.2 1.4 supply voltage [V] 0 0 50 T [°C] 100 −2 10 0 10 lifetime [y] Figure 3.11.: Voltage, temperature and lifetime dependence of HCI. lifetime dependence of HCI. The dependence on supply voltage and lifetime follows a power law. To determine the time the transistor is stressed, a duty factor DF is given. The stress time tstress is tlif e /DF . A DF of 100 means that the transistor is stressed for 1/100 of its lifetime. Reported values for n from literature are 0.25 for PMOS and 0.5 for NMOS transistors. Furthermore, a negative temperature dependence is reported, hence HCI is the only effect that gets worth when the temperature is decreased. This is explained by an increase of the free way length of the hot carriers. However, in the degradation equations used in this thesis there is almost no temperature dependence for NMOS transistors and for PMOS transistors it is positive. ∆Ion is, unlike ∆Vth , not a parameter of the transistor model. Hence, ∆Ion can not be directly used to simulate a degraded transistor. However, there is an equivalent circuit for a degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit is used to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift ∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a current controlled current source IDeg is responsible for the mobility degradation. The value of VDeg and IDeg depend on ∆Ion . 3.1.3. Stress conditions in CMOS logic gates Static CMOS logic is the primary design style used in digital integrated circuits, since it has a low static power consumption and it is quite immune to noise [Uyemura, 2001]. Every CMOS logic gate consists of a pull-up and a pull-down network. Those complementary networks represent two switches, with exactly one switch being closed for every input combination. The pull-up network, composed of PMOS transistors, is connected to the supply voltage VDD (logic “1”) and the pull-down network, composed of NMOS transistors, is connected to ground (logic “0”). The gate delay is determined by the time the pull-up/pull-down network takes to recharge the output capacitance. Single-stage logic gates have no internal nets connected to gate terminals of transistors. Those single-stage gates can only represent inverting logic functions. For more complex, non-inverting logic functions, multi-stage gates have to be used by connecting singlestage gates in series. 46 3.1. Aging effects −4 x 10 8 Degradation 6 Id [A] D id IDeg = f (id , ∆Ion ) G VDeg = f (∆Ion) 4 ∆ Ion=0% 2 ∆ Ion=5% 0 ∆ Ion=20% 0 ∆ Ion=10% 0.2 0.4 S 0.6 0.8 Vds [V] 1 1.2 (b) (a) Figure 3.12.: (a) HCI equivalent circuit for a degraded transistor. VDeg and IDeg depend on ∆Ion . (b) Output characteristic of an NMOS transistor for altered values of ∆Ion . A A Z Z 1 2 3 4 Figure 3.13.: Inverter gate and waveform. The simplest logic gate is the inverter. Its pull-up and pull-down networks just consist of one transistor (see Figure 3.13). For NBTI the gate terminal of the PMOS transistor has to be negatively biased with respect to source and drain. Therefore, a logic “0” is applied to the gate input. In this case, the gate-source voltage Vgs is −VDD , the transistor is in inversion and the channel is conducting. Hence, the drain of the transistor is charged to VDD as well (Vds = 0 V). Whenever a logic “0” is applied, the PMOS transistor degrades due to NBTI. NBTI is frequency independent and Kumar et al. [2006] have shown that every arbitrary signal can be converted into a periodical signal that causes the same NBTI drift as the original signal. Hence, it is enough to know the portion of the lifetime a signal is at logic “0”. This can be expressed by 1 − SP . The static signal probability (SP ) is a statistical signal property that is defined as the average amount of time a signal is at logic “1”. For more complex pull-up networks it is more difficult to determine the time a transistor is stressed due to NBTI. The NOR gate in Figure 3.14 has two PMOS transistors connected in series, called a stack. For transistor MP B , the condition is the same as for the single transistor of an inverter (logic “0” at input B ). For transistor MP A again a 47 3. Aging effects and their impact on standard cells B MP B A MP A Z MN B MN A Figure 3.14.: NOR gate with two inputs. logic “0” at the gate terminal is required, but that is not enough. To have the source of this transistor connected to VDD , a logic “0” has to be applied to input B as well [Kumar et al., 2007b]. Hence, whenever MP A is stressed due to NBTI, MP B is stressed as well and the Vth drift of MP B is always equal or larger than the Vth drift of MP A . A formal method to calculate the portion of the lifetime a transistor is stressed, depending on the signal probabilities at the inputs and the internal gate structure, is derived in Section 4.3.3. NBTI only affects PMOS transistors, hence, only the pull-up network is degraded. This increases the gate delay just for a falling input transition. The gate delay for a rising input transition only degrades indirectly. NBTI degrades the output slope as well. The output slope serves as the input slope for succeeding gates. If the input slope degrades, the gate delay increases as well. Due to this, the gate delay for a rising input signal can increase as well. For HCI, a strong lateral electrical field is needed that accelerates the carriers in the channel. This is true for the NMOS transistor of the inverter (see Figure 3.13) when a rising transition is applied to the inverter input. When the signal at the input is still logic “0”, the NMOS transistor is in its non-conducting and the PMOS transistor is in its conducting state. The drain of the PMOS transistor is at VDD , the voltage drop and the electric field across the transistor are maximal. As soon as the NMOS transistor begins to conduct, hot electrons are generated which damage the transistor. Vgs of the NMOS transistor is equal to the input voltage and Vds is equal to the output voltage of the inverter. The conditions for which the hot carrier generation is maximal is a high Vds and Vgs = Vds for CHC and Vgs = 1/2 · Vds for DAHC. However, at least the conditions for CHC are never met for an inverter (or any other logic gate), because Vds has already started to decrease when Vgs is maximal (see waveforms in Figure 3.13). To consider that the HCI stress in a logic gate is different from the DC stress of a single transistor, a empirical correction factor for the degradation equation is given. This correction factor is multiplied by the time the transistor is in stress. It reduces the stress time dependent on the signal slopes of Vds and Vgs . The considerations above are valid for a PMOS transistor as well. A PMOS transistor degrades due to HCI for a falling input slope. Let’s again take a look at the PMOS transistor stack in Figure 3.14. For transistor 48 3.2. Impact on gate performance MP A to degrade due to HCI, a falling transition at input A is required but not sufficient. A strong lateral electrical field only exists if transistor MP B is in its conducting state as well. This results in a conducting path from VDD to the output capacitance. Current flows through transistor MP A until the output capacitance is recharged and the number of hot carriers is proportional to the drain current. The same considerations are true for transistor MP B in the stack. A formal method to calculate the portion of lifetime a transistor is stressed due to HCI, depending on the signal probabilities and transition densities at the gate inputs and the internal gate structure, is derived in Section 4.3.3. Now the waveform in Figure 3.13 can be divided into regions when a transistor is stressed due to NBTI or HCI. For a rising input slope (region 1) the NMOS transistor is degraded due to HCI. When the input signal is logic “1” (region 2) the NMOS transistor would be stressed due to PBTI (PBTI is not considered yet). For a falling input slope (region 3) the PMOS transistor is degraded due to HCI and when the input signal is logic “0” (region 4) the PMOS transistor is in NBTI stress. NBTI increases the gate delay and output slope for a falling input transition; HCI increases delay and output slope for both transitions. 3.2. Impact on gate performance So far, it was discussed which transistor parameters degrade and the factors the parameter drift depends on. Furthermore, the conditions that have to be fulfilled for a transistor of a logic gate to degrade are derived. Now, the impact of such a parameter drift on the performance of logic gates and sequential cells is investigated. 3.2.1. Impact on combinational gates The impact of aging on combinational gates is obtained via SPICE simulation, by performing a parameter sweep over the parameter that drifts. This provides the sensitivity of a gate to a parameter drift. To determine the impact of other factors (e.g., temperature or supply voltage) on the sensitivity, these factors are altered while the sensitivities are determined. Unless stated otherwise, an inverter with a small driving strength of an industrial 90 nm cell library is chosen. The simulation conditions are: nominal supply voltage (1.2 V), 27 ◦C, and nominal process corner. To compare different gate types and technologies, a fan-out-3 test structure (see Figure 3.15) is chosen. The input slope and output load of the device under test (DUT) are defined by the test structure. Such kind of test structures are used, for instance, to evaluate and compare the performance of different standard cell libraries. Negative Bias Temperature Instability A parameter sweep over the local threshold voltage is performed for NBTI. Only the performance degradation for a falling input slope is investigated. Figure 3.16 shows a strong dependence of the gate delay sensitivity to the supply voltage. The sensitivity is given by the slope of the curve in Figure 3.16(a). The 49 3. Aging effects and their impact on standard cells DUT ∆delay (falling input) [%] INV; 90nm; 27°C; 60 40 0.7V 0.9V 1.2V 1.5V 20 0 0 0.02 0.04 0.06 |∆Vth| [V] (a) 0.08 0.1 ∆delay/∆Vth (falling input) [%/100mV] Figure 3.15.: Fan-out-3 structure: All gates in the test structure are identical to the DUT. The voltage source generates a step function. To have a realistic input signal at the DUT, the step function has to propagate through two gates before reaching the DUT. Those two gates and the DUT have to drive three gates. INV; 90nm; 27°C; 60 50 40 30 20 0.8 1 1.2 VDD [V] 1.4 (b) Figure 3.16.: Supply voltage dependence. degradation of the gate delay ∆delay is the change of the gate delay normalized to the gate delay without a parameter drift. Figure 3.16(b) depicts the degradation of the gate delay over the supply voltage for a ∆Vth of 100 mV. The lower the supply voltage the larger is the sensitivity. The impact of temperature is much smaller (see Figure 3.17). Again, the sensitivity is increased by a lower temperature. As described in Section 3.1.1, it is important to distinguish between the current and the effective temperature and supply voltage. The current value defines the sensitivity and the effective value, which is the value over the lifetime, defines the parameter drift. The worst case is a high effective and a low current temperature and supply voltage. This is, for instance, the case for a circuit with a high performance mode and a low power mode. If the circuit is operated for a long time in the high performance mode, the transistors experience a large parameter drift. If the circuit is then switched into a low power mode, the circuit becomes very sensitive. Hence, a large degradation of the circuit performance can be observed. 50 ∆delay (falling input) [%] INV; 90nm; 1.2V; 30 20 −40°C 27°C 85°C 125°C 10 0 0 0.02 0.04 0.06 |∆Vth| [V] 0.08 0.1 ∆delay/∆Vth (falling input) [%/100mV] 3.2. Impact on gate performance INV; 90nm; 1.2V; 30 25 20 15 10 5 0 0 50 T [°C] (a) 100 (b) Figure 3.17.: Temperature dependence. 90nm; 27°C; 1.2V; all PMOS identical ∆ Vth 20 ∆ delay (falling input) [%] 6delay (falling input) [%] INV; 90nm; 1.2V; 27°C 30 A B C D 10 0 0 0.02 0.04 0.06 |6Vt| [V] (a) 0.08 0.1 25 20 15 INV NAND2 NOR2 NOR3 10 5 0 0 0.02 0.04 0.06 |∆Vth| [V] 0.08 0.1 (b) Figure 3.18.: Dependence on driving strength and gate type. Figure 3.18(a) shows that the driving strength of a cell has almost no effect on the sensitivity. The gate type, on the other hand, has an impact on the sensitivity (see Figure 3.18(b)). For the NAND and NOR gates, it is assumed that all PMOS transistors have the same threshold voltage drift. The NOR gates degrade much stronger than the NAND gate and the inverter for the same ∆Vth . This is caused by the stacked PMOS transistors in a NOR gate. For a falling input signal the output load is recharged over two (NOR2) or three (NOR3) degraded PMOS transistors. In Figure 3.19(a) and 3.19(b) the process corner (fast, nominal, slow) and the transistor type (low Vth , regular Vth , or high Vth ) are altered. Both have only a minor impact on the sensitivity. To determine the impact of input slope and output load, a single gate is simulated and the input slope and output load are altered. Figure 3.20(a) gives the sensitivity for four different slope load combinations (slow/fast input slope and small/large output 51 3. Aging effects and their impact on standard cells INV; 90nm; 27°C; 1.2V 20 6delay (falling input) [%] ∆delay (falling input) [%] INV; 90nm; 27°C; 1.2V 30 slow corner nom corner fast corner 10 0 0 0.02 0.04 0.06 |∆Vth| [V] (a) 0.08 0.1 30 20 reg Vth high Vth low Vth 10 0 0 0.02 0.04 0.06 |6Vth| [V] 0.08 0.1 (b) Figure 3.19.: Dependence on transistor type and process corner. (a) (b) Figure 3.20.: Dependence on input load and output slope. load). The sensitivity stays almost constant except for the case with a slow input slope and a small output load, which shows a much higher sensitivity. Figure 3.20(b) depicts the degradation of the gate delay for a ∆Vth of 100 mV over the range of characterized input load and output slope pairs. Besides the gate delay, the impact on the output slope is investigated as well. Figure 3.21(a) and 3.21(b) show the impact of supply voltage and temperature, respectively. The degradation of the output slope and the degradation of the gate delay (in terms of percentage) are about the same. Hot Carrier Injection To determine the impact of HCI on the sensitivities, the transistors of the logic gates are replaced by the HCI equivalent circuit (see Figure 3.12(a)) and a parameter sweep 52 3.2. Impact on gate performance INV; 90nm; 1.2V; 80 60 ∆slopeout (falling input) [%] ∆slopeout (falling input) [%] INV; 90nm; 27°C; 0.7V 0.9V 1.2V 1.5V 40 20 0 0 0.02 0.04 0.06 |∆Vth| [V] 0.08 0.1 30 20 −40°C 27°C 85°C 125°C 10 0 0 0.02 0.04 0.06 |∆Vth| [V] (a) 0.08 0.1 (b) Figure 3.21.: Dependence of output slope degradation on supply voltage and temperature. INV; 90nm; 1.2V 40 ∆delay (falling input) [%] ∆delay (falling input) [%] INV; 90nm; 27°C 50 0.9V 1.2V 1.5V 30 20 10 0 0 5 10 ∆ION [%] 15 (a) 20 30 20 −40°C 27°C 85°C 100°C 10 0 0 5 10 ∆ION [%] 15 20 (b) Figure 3.22.: Supply voltage and temperature dependence for HCI. is performed. This time, ∆Ion is varied, which changes the values of the voltage source and the current controlled current source of the equivalent circuit. Figure 3.22 shows the dependence of the sensitivity on supply voltage and temperature. It is similar to the dependence with respect to NBTI. Due to that, no further dependencies are given for HCI. By comparing the degraded transistor characteristics for NBTI and HCI (see Figure 3.3 and 3.12(b)), one can see that both aging effects have a similar impact on the transistor characteristics. This explains their similar impact on the sensitivities. 3.2.2. Impact on flip-flops Sequential circuits consist of logic gates and storage elements. Most sequential circuits are synchronous designs and use edge-triggered flip-flops. In this section, the impact of 53 CN CP 3. Aging effects and their impact on standard cells TG1 TG2 IV2 IV3 CLK IV7 CN IV8 IV4 Q IV6 CN IV1 CP D IV5 CP Comparison of setup time and delay degradation 40 30 20 10 PMOS_IV1 0 PMOS_IV8 PMOS_TG1 10 PMOS_TG2 all PMOS 20 Inverter 30 0.00 0.02 0.04 0.06 0.08 0.10 |∆Vth|[V] (a) Comparison of hold time and delay degradation PMOS_IV8 PMOS_IV2 150 PMOS_IV3 all PMOS 100 Inverter 200 ∆tHLD (rising input) [%] ∆tSUP (rising input) [%] Figure 3.23.: Schematic of master-slave flip-flop. 50 0 50 0.00 0.02 0.04 0.06 |∆Vth|[V] 0.08 0.10 (b) Figure 3.24.: Plot of sensitivities for setup and hold time. aging on a master-slave flip-flop (MSFF) (see schematic in Figure 3.23), a commonly used flip-flop type, is investigated. Besides the gate delay (in this case the clock-to-q delay dCLK−to−Q ) and output slope, two other timing constraints, the setup time tSU P and the hold time tHLD , are important for sequential cells. In contrast to gate delay and output slope, tSU P and tHLD cannot be measured directly, but tSU P and tHLD are obtained by solving an optimization problem: Optimize the time difference between data signal change and clock edge such that dCLK−to−Q is 110 % of the relaxed value. Figure 3.24 shows the degradation of the setup time when the threshold voltage of the PMOS transistor decreases (This is the case for a degradation due to NBTI). The solid lines show the degradation when just one particular transistor degrades. Some transistors increase and others decrease the degradation of the setup time. The transistors with the highest positive or negative degradation are chosen. The blue dashed line depicts the degradation if all transistors have the same Vth drift. The green dotted line shows the delay degradation of an inverter for comparison. One can see that the setup time degradation and the delay degradation are almost exactly the same. Figure 3.24(b) shows the same information for the hold time. 54 3.2. Impact on gate performance D Q D1 ... D2 D Q D Q Clk tSUP tHLD Clk D Figure 3.25.: Sequential circuit with setup and hold time. From this study it can be seen that the sensitivities can well be linearized and that the degradation of tSU P and tHLD is in the same order as the degradation of gate delays. This has the following implications for the timing behavior of a sequential circuit: • For a long timing path (e.g., path ending at D1 in Figure 3.25), the setup time constraint is relevant. It is violated if the data signal arrives after the setup time at the receiving FF. Due to aging, the gate delays along the data path degrade and, therefore, the path delay increases. Whether tSU P increases or decreases depends on which transistors degrade the most. If tSU P decreases, this would compensate some amount of the slower data path. If tSU P increases, the timing problem due to the slower data path is amplified. One has to consider that a long data path consists of many gates and the degradation of one gate delay is approximately as large as the setup time degradation. For the investigated MSFF, tSU P is in the same order of magnitude as the delay of combinational gates (several tens of picoseconds). Hence, the degradation of tSU P plays a minor role compared to the degradation of the gate delays along the path. • For a short timing path (e.g., path ending at D2 in Figure 3.25), the hold time constraint is relevant. It has to be ensured that the data signal at the MSFF does not change before the hold time. This time the data path only consists of a few gates or even none at all. If the path consists of a few gates, the degradation of the path delay and the hold time degradation can cancel each other out. This is not the case when there are no gates along the path. For the investigated MSFF one has to consider that the nominal hold time is only a few picoseconds. This means also a degradation by 150 % (as seen in Figure 3.24(b)) does not change the absolute value of the hold time much. Following this argumentation, it is shown that for timing verification the modeling of 55 3. Aging effects and their impact on standard cells the gate delay degradation is more important than the modeling of the degraded setup and hold time. A long timing path consists of many gates and the degradation of one single gate is comparable to the setup time degradation. For a short timing path without any gates the degradation of the hold time can be relevant, but not for the investigated MSFF, because its hold time is only a few pico seconds. 3.2.3. Impact on power dissipation There are three main factors of power dissipation in a CMOS gate [Chandrakasan and Brodersen, 1995]: Switching power dissipation (Pswitching ): Power is consumed by charging the output load. First, the the load capacitance CL is charged to VDD by the pull-up network. At the next output signal transition, the charge stored at the capacitance flows through the pull-down network to ground. Pswitching is given by the following formula: Pswitching = T D/2 · fCLK · VDD 2 · CL (3.10) Short-circuit power dissipation (Pshort−circuit ): Power dissipation caused by a conducting path from VDD to ground that is formed when the NMOS and PMOS transistors are conducting simultaneously for a short period of time during a transition. Pshort−circuit is given by: Pshort−circuit = Ishort−circuit · VDD (3.11) Leakage power dissipation (Pleakage ): Leakage power originates from the leakage currents Ileakage of a transistor when it is in off-state: Pleakage = Ileakage · VDD (3.12) For sub-100 nm technologies, the gate tunneling current and the subthreshold leakage current are the two dominant factors [Piguet, 2005]. The gate tunneling current strongly depends on oxide thickness, whereas the subthreshold leakage current depends, amongst others, on the threshold voltage. Pswitching and Pshort−circuit are combined to the dynamic power dissipation and Pleakage is also known as static power consumption. For the fan-out-3 test structure, Pshort−circuit is responsible for about 10 % of the dynamic power. The portion of Pshort−circuit would be increased by a slower input transition or by a smaller output load. To investigate the impact of aging on these components of power dissipation, Vth of the PMOS transistors is increased. Pswitching does not depend on the threshold voltage and stays constant. The same is true for the gate tunneling current. The subthreshold current is exponentially dependent on gate-source voltage Vgs and threshold voltage Vth : Ids ∝ eVgs −Vth 56 (3.13) 3.3. Technology trend PMOS; 90nm; Vds=Vnom; Vgs = 0V; 27°C 100 90 80 Ileakage [%] Pshort−circuit [%] INV; 90nm; Vnom; 27°C 100 80 70 rising input 60 50 0 0.04 0.06 |∆ Vth| (a) 0.08 40 20 falling input 0.02 60 0.1 0 0 0.02 0.04 0.06 |∆ Vth| 0.08 0.1 (b) Figure 3.26.: (a) Change of Pshort−circuit by altering Vth . Pshort−circuit decreases for a rising and a falling input transition. (b) Subthreshold current for a PMOS transistor (with Vgs = 0 V and Vds = 1.2 V) for altered ∆Vth values. Hence, by increasing the threshold voltage the subthreshold component of Pleakage is strongly reduced (see Figure 3.26(b)). The impact of an increased threshold voltage on Pshort−circuit is determined by simulation. The threshold voltage of the PMOS transistor of an inverter is altered and the drain current Id of the transistor that is going from on- to off-state for the considered input transition is measured (see. Figure 3.26(a)). One can see that for both transitions Pshort−circuit decreases when the threshold voltage drift is increased. By considering all important components of power dissipation, it can be seen that aging has almost no effect on power dissipation. Indeed, power dissipation is slightly reduced by aging. Pswitching , which is responsible for a large part of the dynamic power dissipation, remains unaffected and Pshort−circuit slightly decreases. The static power dissipation is slightly decreased as well4 . Hence, it is justified that this thesis focuses on analyzing the impact of aging on timing and not on power dissipation. 3.3. Technology trend So far, all investigations were done using a 90 nm technology. Now it is investigated how the drifts and the sensitivities evolve for different technologies. For that purpose, five technologies are compared: 120 nm, 90 nm, 65 nm LP (low power), 65 nm HP (high performance), and 45 nm LP5 . The main difference between LP and HP technologies is 4 Pleakage caused by tunneling currents remains constant, but Pleakage caused by subthreshold currents is reduced 5 For the 45 nm LP and the 65 nm HP technology, just the transistor models and the degradation equations were available, but no standard cell libraries. In order to have logic gates for those technologies, gates from the 65 nm LP technology were taken. The transistor types were replaced and the width and length of the transistors were adjusted. 57 3. Aging effects and their impact on standard cells 8 vertical E−field [V/m] x 10 8 6 4 2 0 130nm 90nm 65nm Technology 45nm Figure 3.27.: Vertical electrical field over technologies at nominal supply voltage. the gate oxide thickness. Low power technologies have a thicker gate oxide, resulting in less leakage currents. The common opinion in the literature is that the Vth drift due to NBTI increases with newer technologies (e.g., see [Strong et al., 2009; Huard et al., 2009]). This is due to the strong dependence of the drift on the vertical electrical field. The electrical field increases because the transistor sizes are scaled more aggressively than the supply voltage. Figure 3.27 shows the increasing electrical field. This was calculated from data in the international technology road map for semiconductors [ITRS, 2001, 2009]. The vertical electrical field is given by: Evertical = Vnom tox (3.14) Vnom is the predicted nominal supply voltage for a technology and tox is the corresponding physical oxide thickness. However, the degradation equations for the technologies do not show such a clear picture. The correlation between drift and Vnom can be seen by comparing Figure 3.28(a) and Figure 3.28(b). The 120 nm technology with a Vnom of 1.5 V has the largest drift, followed by 90 nm, 65 nm LP and 45 nm LP (Vnom = 1.2 V). The 65 nm HP technology (Vnom = 1.0 V) shows the smallest drift over the lifetime. If the drifts over the lifetime are calculated for a VDD of 1.2 V for all technologies, the difference between the technologies (see Figure 3.28(b)) is less than 10 mV. All five technologies still have a SiO2 gate dielectric, hence, the impact of high-k metal gates is not considered yet. With high-k metal gates, the gate dielectric becomes thicker again. This reduces the electrical field. However, it is observed that NMOS transistors experience a Vth drift due to PBTI in the same order of magnitude as the PMOS transistors due to NBTI. In the last several years, the research focus was on the NBTI effect, but Huard et al. [2009] argue that HCI is no longer negligible due to a constant lateral field increase since the 120 nm technology node. As can be seen in Figure 3.29(a) the lateral electrical field increases as well with newer technologies (Elateral = Vnom/Lmin with Lmin being the minimal gate length). Figure 3.29(b) shows the ∆Ion drift for PMOS and NMOS transistors due to HCI as calculated with the degradation equations. The PMOS transistors degrade stronger 58 3.3. Technology trend PMOS; 125°C; Vnom; W=10µm; Lmin 2 |∆ Vth| [% of Vth0] 1 10 PMOS; 125°C; 1.2V; W=10µm; Lmin 10 120nm 90nm 65nm LP 65nm HP 45nm LP |∆ Vth| [% of Vth0] 2 10 0 120nm 90nm 65nm LP 65nm HP 45nm LP 1 10 0 10 0 10 1 10 10 0 1 10 lifetime [y] 10 lifetime [y] (a) (b) Figure 3.28.: Transistor drifts due to NBTI and for different technologies at nominal supply voltage (a) and at a supply voltage of 1.2 V (b). DF=100; Vnom; 25°C; W=10µm; Lmin 1 10 7 PMOS 120nm 90nm 65nm LP 65nm HP 45nm LP NMOS 120nm 90nm 65nm LP 65nm HP 45nm LP x 10 3 ∆ ION [%] lateral E−field [V/m] 4 2 0 10 1 0 130nm 90nm 65nm Technology 45nm 0 5 10 10 lifetime [h] (a) Lateral electrical field over technologies (b) Transistor drifts due to NBTI and for different at nominal supply voltage. technologies at nominal supply voltage Figure 3.29.: HCI over technology nodes. than the NMOS transistors. The PMOS transistors show a clear technology trend. The drift increases with newer technologies. The only exception is the 45 nm LP technology with a drift smaller than the one of the 65 nm technologies. However, the parameter drift is only half the truth, it is equally important how the sensitivities evolve over the technologies. Figure 3.30(a) shows the sensitivity of the gate delay with respect to a Vth drift. 120 nm, with 1.5 V nominal supply voltage, has the lowest sensitivity. 65 nm HP (1.0 V nominal supply voltage) reveals the largest sensitivity. The other technologies have a nominal supply voltage of 1.2 V and lie in between. Hence, the sensitivities show the completely opposite behavior than the drifts. The 65 nm HP technology, for instance, has the smallest drifts, but the highest sensitivity. To compare the degradation of the gate delay for those five technologies, seven use profiles from the business units of an industry partner were chosen. Use profiles specify the operating conditions a circuit must be able to sustain during its lifetime. It consists, among other parameters, of a specified lifetime, a maximal supply voltage and a temperature profile. The temperature is either given by a mean value, by intervals or 59 3. Aging effects and their impact on standard cells Sensitivities over technologies 120nm 90nm 65nm LP 65nm HP 45nm LP 30 20 35 ∆delay/∆Vth [%/100mV] ∆delay(falling input) [%] INV; Vnom; 27°C 10 0 0 0.02 0.04 0.06 ∆Vt [V] 0.08 0.1 30 25 20 15 10 5 0 120nm (a) 90nm Technology node 65nm HP (b) Figure 3.30.: Sensitivity of the inverter delay for different technologies. INVïdelay degradation for different stress profiles 6delay (falling input) [%] 25 Profile A Profile B Profile C Profile D Profile E Profile F Profile G 20 15 10 5 0 65nm LP 45nm LP 120nm Technology 90nm 65nm HP Figure 3.31.: Degradation of inverter delay for different technologies and use profiles. by a Gaussian distribution. The 65 nm HP technology shows the largest degradation of the gate delay for a falling input transition (see Figure 3.31), followed by the 90 nm and the 120 nm technologies. The low power technologies with a thick gate oxide show the lowest degradation. 3.4. Summary For an aging-aware timing analysis aging effects that cause a parameter drift are relevant. The two most severe drift-related aging effects nowadays are NBTI and HCI. In order to determine the degradation caused by aging effects, the parameter drifts due to a particular aging effect have to be considered as well as the sensitivity of a gate performance with respect to a parameter drift. The physical mechanism behind NBTI is not yet completely understood. NBTI can best be modeled by an increased Vth of the PMOS transistors. A special characteristic of NBTI is that the Vth drift recovers when the transistor is no longer stressed. NBTI is strongly dependent on the supply voltage. As soon as high-k metal gates are used, PBTI must also be considered because then for such gates, PBTI shows a drift in the 60 3.4. Summary same order of magnitude as NBTI. HCI was the dominant aging effect until it was outplayed by NBTI. However, due to the constantly increasing lateral electric field in newer technology nodes, HCI is no longer negligible. HCI leads to a threshold voltage drift and to a mobility degradation. The supply voltage dependence of HCI is quite strong as well. How sensitive a gate is to a parameter drift caused by NBTI or HCI is also strongly dependent on the supply voltage. In contrast to the parameter drifts, however, the sensitivity is increased by a lower supply voltage. It is shown that modeling the degradation of the gate delay is more important than modeling the degradation of a flip-flop. A long timing path consists of many gates and the degradation of one single gate is comparable to the setup time degradation. For short timing paths without any gates the degradation of the hold time can be relevant, but not for the investigated master-slave flip-flop, because its hold time is only a few picoseconds. Aging causes a circuit to slow down, but the power consumption is almost not affected. It is even likely that the power consumption is slightly reduced. Pswitching stays constant and Pshort−circuit is slightly reduced. The subthreshold leakage current, one component of the static power consumption, is also reduced by an increased threshold voltage. In theory, the degradation due to NBTI as well as HCI should increase in advanced technologies, due to increasing vertical and lateral electrical fields in a transistor. For HCI, the degradation equations represent this trend. However, these trends can not be seen in the degradation equations for NBTI. On possible reason is that it is hard to compare the NBTI drifts for different technologies because NBTI strongly depends on several manufacturing steps. A clear trend can be seen for the sensitivity of a gate. It increases with newer technologies, due to the reduced supply voltage. One can conclude that the degradation caused by HCI will increase in newer technologies. For NBTI, it depends on what predominates — the reduced drift or the increased sensitivity. 61 4. Aging-aware static timing analysis For performing an aging-aware TA on gate level, a gate model is required that provides the aged gate delay instead of the fresh one. This is the main difference compared to a traditional STA without aging. The proposed aging-aware gate model is called AgeGate [Lorenz et al., 2009a] and it has the following advantages compared to the state-of-the-art approaches discussed in Chapter 2.2.2: Analyzing impact of NBTI and HCI: The proposed aging-aware gate model is not limited to just one aging effect. It analyses the combined impact of NBTI and HCI. From the aging-aware gate models that were already introduced, all except the LUT-based gate model in [Chen et al., 2011] considers just one aging effect. The results of our proposed approach show that the mean degradation of the circuit delay is 10.1 % for NBTI and 3.2 % for HCI. Hence, HCI can not be neglected although NBTI is the dominant aging effect for the investigated 90 nm technology and the chosen operating conditions. Individual parameter drifts: The single transistors of a gate degrade individually, because due to the workload at the gate inputs and the internal gate structure the time the transistors are in stress mode differs. A formal way to calculate individual parameter drifts for every transistor is developed. A canonical gate model is used, which can consider the impact of the individual parameter drifts on the gate performances. The results show that the degradation is overestimated by 20 % without considering individual parameter drifts. Degradation of the output slope: The proposed approach not only calculates an aged gate delay, but in addition an aged output slope is determined. Like in a traditional STA, signal waveforms are modeled as ramps. The output slope of one gate determines the input slope of a succeeding gate and this, in turn, impacts the gate delay of the succeeding gate. The results show that the degradation of the circuit delay is underestimated by 24 % when the fresh output slope instead of an aged output slope is taken to calculated the gate delay. Easy extensibility: The gate model considers two aging effects at the moment, but the approach can easily be extended. The proposed approach is based on calculating the transistor drifts and then computing the aged gate performances. For calculating the aged gate performances, the sensitivities of the gate performances with respect to a parameter drift are required. Other aging effects that cause a drift of transistor parameters can be taken into account if degradation equations are available and the sensitivities for this new aging effect are characterized. 63 4. Aging-aware static timing analysis One effect that gets important in technologies with a metal gate is positive bias temperature instability (PBTI). PBTI is the counterpart of NBTI and degrades NMOS transistors. Another effect that might become relevant in the future and must be modeled is TDDB. In [Choudhury et al., 2010] it is shown that also TDDB leads to a degradation of transistor characteristics before it comes to an catastrophic breakdown. The degradation equations can also be replaced by more accurate ones to take the recovery effect for NBTI into account. Independence of the use profile: Another advantage compared to aged LUT-based approaches, like Glacier [Wu et al., 2000], is that the operating conditions over lifetime and the workload just affect the degradation equations. Since the sensitivities, which are obtained when the cell library is characterized, are independent of the use profile, the library does not have to be re-characterized in case the use profile changes (e.g., the temperature over lifetime is not 125 ◦C but 110 ◦C). The chapter is organized as follows: First the complete aging analysis flow is introduced (Section 4.1), then it is explained how the workload can be determined (Section 4.2). In Section 4.3 the proposed aging-aware gate model is explained. The characterization of the standard cells is described in Section 4.4 and results for several benchmark circuits are given in Section 4.5. 4.1. Aging-aware STA flow An aging-aware static timing analysis (ASTA) works similar to a traditional STA (see Section 2.1). The main difference is that an aging-aware gate model is required to compute aged gate performances (gate delay and output slope) instead of fresh ones. Those aged gate performances depend on the use profile and workload at the gate inputs. Figure 4.1 summarizes the ASTA flow: 1. The operating conditions over lifetime are specified by globally setting the supply voltage Vef f and the temperature Tef f . The approach could also be extended to take voltage drops and temperature gradients over a chip into account by having individual supply voltage and temperature values for every gate. That way, the accuracy of the aging analysis could be increased. 2. The workload at the gate inputs is required to calculate individual transistor drifts. The workload is defined by gate input signals over lifetime and it is determined by two statistical parameters, signal probability (SP ) and transition density (T D): • SP and T D can be obtained by performing a logic simulation of the circuit. However, this requires typical input signals for the circuit and contradicts the fundamental idea of a static timing analysis, which is independent of input signals. 64 4.1. Aging-aware STA flow Figure 4.1.: Aging analysis flow • Another approach to obtain SP and T D are probabilistic methods which were developed for analyzing dynamic power dissipation. Probabilistic approaches just require the values for SP and T D at the primary inputs. These values are propagated through the circuit. • If neither realistic input signals nor values for SP and T D at the primary inputs are available, a worst-case analysis can be performed. Worst-case values for SP and T D are specified that are used for all nets of the circuit. By choosing 0 % for SP , it is guaranteed that all PMOS transistors are in inversion during the entire lifetime and the circuit degrades maximal due to NBTI. For T D the specification of worst-case values is more difficult because it has to be considered that due to the delay of the gates a signal may change several times before settling to its static value (this is referred to as glitches). In the proposed approach, a probabilistic method is used whenever the workload should be considered. Probabilistic methods are described in more detail in Section 4.2. 3. After operating conditions and the workload are determined, the aged gate performances can be calculated by modifying the Function update_edge from page 21 (see Function update_edge_aged on page 66). First, the stress probabilities for the single transistors of a gate are obtained. The stress probability is the percentage of time that a transistor is stressed by a particular aging effect during the lifetime. Next, the parameter drifts for the single transistors are computed by means of degradation equations for NBTI and HCI. Finally, the aged gate delay is computed by adding up the fresh gate delay and the gate delay degradation. 65 4. Aging-aware static timing analysis Function update_edge_aged(u,v) /* Recursive function to update the gate delay of an edge (u, v) if dvalid ((u, v)) == F alse then /* Update gate delay based on input slope and output load slope = get_slope_from_node(u); load = get_load_from_node(v); stress_probabilities = get_Pstress(); drif ts = get_drifts(use_profile, stress_probabilities); df resh ((u, v)) = get_delay_from_LUT(slope, load); ∆d((u, v)) = get_degradation(slope, load, drifts); daged ((u, v)) = df resh ((u, v)) + ∆d((u, v)); dvalid ((u, v)) = T rue end return daged ((u, v)) */ */ 4.2. Workload determination The degradation of a gate depends strongly on the gate input signals over lifetime. In Section 3.1.3 it is described that a logic “0” at a gate input results in a degradation of the PMOS transistors due to NBTI. The fraction of the lifetime a signal is at logic “1” is given by a statistic signal property called static signal probability. According to Najm [1994]: Signal probability: The signal probability SP (x) of a node x is the average number of clock periods a signal is at logic “1”. Hence, the probability that a signal is logic “0” is 1−SP (x) and is from now on referred to as SP (x). For HCI on the other hand, it is relevant how often a signal changes its logic state. The statistical signal property of interest is the transition density [Najm, 1994]: Transition density: The transition density T D(x) of a signal x is the mean number of signal transitions per clock period. Hence, to consider the impact of the workload on the degradation, the exact signal waveforms for all gate inputs are not required. It is enough to obtain the two statistical signal properties for all gate inputs. One possibility to determine SP and T D is by logic simulation. If typical/worst-case circuit input vectors are available, they can be used to simulate the circuit, store the signal waveforms at every gate input, and compute SP and T D. Such an approach is called strongly input pattern dependent because typical circuit input waveforms are required. The remaining approaches, discussed here, just require values for SP and T D at the primary inputs to be specified. These approaches are called weakly input pattern dependent. 66 4.2. Workload determination Xakellis and Najm [1994] generate random signal vectors for the primary inputs, which have the specified SP and T D. Then, logic simulation is used to obtain the signal waveforms at every gate input and SP and T D for those signals are computed. This is repeated until the stopping criterion is reached. At every iteration a new mean value for SP and T D at every gate input is calculated. The stopping criterion is fulfilled when all mean values are within a confidence interval specified in advance. Although, this approach is very accurate, it is quite time consuming. The following approaches are called probabilistic methods because they propagate the statistical signal properties directly from the primary inputs into the circuit. The approaches differ in how accurately they consider the spatial and temporal dependence of signals. Spatial and temporal dependence are defined as follows: Spatial dependence: Two signals may depend on one another. For instance, both signals cannot be logic “0” at the same point in time. Spatial dependence arises when a circuit has feedback (sequential circuits) and for a signal that splits and reconverges again. In general, probabilistic methods assume spatial independent signals at the primary inputs. Temporal dependence: The logic value of a signal for two points in time may be interdependent. A clock signal, for instance, is logic “1” during half a clock period and logic “0” in the succeeding half. Cirit [1987] computes the signal probability at a gate output y = f (x1 , x2 , . . . , xn ) from the signal probabilities at the gate inputs by the following recursive formula: SP (y) = SP (x1 ) · SP (fx1 ) + SP (x1 ) · SP (fx1 ) (4.1) fx1 and fx1 are the cofactors of f with respect to x1 . For a NAND gate (y = x1 · x2 ) the signal probability at the output is: SP (y) = SP (x1 ) · SP (x2 ) (4.2) If temporal independence is assumed, the transition density is easily calculated by ([Yeap, 1998] p. 64): T D(y) = 2 · SP (y) · SP (y) (4.3) By not just propagating the signal probability but also the transition density through the circuit, temporal dependence can be taken into account. Najm [1993] propagates T D by the following formula: T D(y) = n X i=1 SP ( ∂y ) · T D(xi ) ∂xi (4.4) is the Boolean difference (∂y/∂x := yx ⊕yx ) and the signal probability of the Boolean difference is the probability that the gate is sensitized and the transition at input xi is ∂y/∂x 67 4. Aging-aware static timing analysis a b a x c c z b y (a) 0 1 (b) Figure 4.2.: An example on calculating signal probabilities observed at the gate output. It is assumed that the signals at the inputs x1 to xn are spatially independent. SP and T D at the nets can be computed directly from the statistical signal properties at the primary gate inputs and not by propagating SP and T D from the gate inputs to the gate outputs. In this case no internal spatial independence has to be assumed. It is just assumed that the signals at the primary inputs are independent. A binary decision diagram (BDD) is used to express the logic function of a signal dependent on the primary inputs. The cofactors can now easily be calculated by following the true and the false branch of the particular node of the BDD. The Boolean difference can be computed with a BDD as well. Hence, Equation 4.1 and 4.4 can be used to compute SP and T D for a signal directly from SP and T D at the primary inputs. The difference between propagating SP from the gate inputs to the gate outputs or directly computing SP from the primary inputs is illustrated by the following example (see Figure 4.2). All three primary inputs have a signal probability of 0.5 and a transition density of 1. SP and T D at the internal nets x and y are the same for both approaches: SP (x) = SP (a) · SP (xa ) + SP (a) · SP (xa ) = SP (a) · SP (b) + SP (a) · 0 = 0.25 ∂x ∂x T D(x) = SP · T D(a) + SP · T D(b) = SP (b) · SP (a) + SP (a) · SP (b) = 1 ∂a ∂b SP (y) = SP (b) · SP (c) = 0.25 T D(y) = 1 First SP and T D at z are computed from SP and T D of the internal nets: SP (z) = SP (x) · SP (y) = SP (a) · SP (b)2 · SP (c) = 0.0625 T D(z) = SP (y) · T D(z) + SP (x) · T D(y) = 0.5 68 4.3. AgeGate: Aging-aware gate model When the BDD is used and SP (z) and T D(z) are computed from SP and T D at the primary inputs directly, it looks as follows: SP (z) = SP (a) · SP (b) · SP (c) = 0.125 ∂z ∂z ∂z · T D(a) + SP · T D(b) + SP · T D(c) T D(z) = SP ∂a ∂b ∂c = SP (b · c) · T D(a) + SP (a · c) · T D(b) + SP (a · b) · T D(c) = 0.75 The difference in SP (z) and T D(z) results from the fact that the first approach does not consider the spatial correlation from the reconvergent paths starting at b. Unfortunately, circuits of industrial complexity are too large to set up the BDD for the entire circuit. This is the reason why in this thesis the statistical signal properties are propagated from the gate inputs to the gate outputs and the resulting accuracy penalty is accepted. In [Najm, 1993] a compromise is proposed. The circuit is partitioned and a BDD is generated for each partition of the circuit. That way the spatial correlation within a partition is kept and just at the partition borders the correlation is lost. 4.3. AgeGate: Aging-aware gate model After the operating conditions over lifetime have been specified and the values for SP and T D at the gate inputs have been obtained, the aged gate performances can be computed. AgeGate consists of three fundamental parts: • A canonical gate model • Technology specific degradation equations • Information about the internal gate structure The canonical gate model provides the aged gate performances dependent on the parameter drifts of the single transistors. These drifts are calculated by technology specific degradation equations. The workload has an essential impact on the parameter drifts, since it defines the fraction of the lifetime a transistor is actually stressed by a particular aging effect. To determine this impact, information about the internal gate structure is required. 4.3.1. Canonical gate model The canonical gate model corresponds to a first-order Taylor series approximation at the nominal gate performance qf resh : qaged = qf resh + ∆q = qf resh + X X χqm,p · ∆pm (4.5) m∈G p∈P The aged gate performance (qaged ) is the sum of the fresh gate performance (qf resh ) and the degradation of the gate performance (∆q). G is the set of all transistors of the gate 69 4. Aging-aware static timing analysis and P is the set of all parameters that drift due to aging effects. χqm,p are the sensitivity coefficients and ∆pm is the parameter drift of a particular transistor m. The sensitivity coefficients are defined as: ∂q χqm,p = |∆p =0 (4.6) ∂∆pm m It is the partial derivative of q to a drift ∆pm at the nominal parameter value (∆pm = 0). For the aged gate delay daged this results in: daged = df resh + X m∈G ∂d ∂d · ∆Ion,m · ∆Vth,m + ∂Vth,m ∂Ion,m ! (4.7) The sensitivity coefficients ∂d/∂Vth,m and ∂d/∂Ion,m are obtained together with the fresh gate delay df resh when the gate is characterized. The drifts ∆Vth,m and ∆Ion,m are computed during aging analysis by degradation equations. The aged output slope is modeled similarly to the aged gate delay. Figure 4.3 shows that only a small error is introduced by linearizing the dependence of the gate performance to a parameter drift. The degradation of an inverter delay for a drift of Vth and Ion is shown. The dependencies are once simulated on transistor level and also calculated by means of the sensitivities. The comparison shows a good match until 10 % degradation of Ion and 50 mV Vth drift. Those are drift values which are reached just for very demanding operating conditions over lifetime (10 y, 125 ◦C, and 110 % Vnom ). Hence, the linearized sensitivities in the canonical gate model are justified. Should the parameter drifts become too large and the error of the linear model be no longer acceptable in future technologies, it is possible to move to a quadratic gate model as it it proposed in [Zhang et al., 2005] for statistical static timing analysis (SSTA). 4.3.2. Degradation equations In order to compute the aged gate performances with the canonical gate model, the parameter drifts for all transistors are required. These drifts are calculated by degradation equations. For NBTI a threshold voltage drift (∆Vth ) is provided and for HCI the degradation equation yields a drift of the drain saturation current in terms of percentage (∆Ion ): ∆Vth = f1 (Vef f , Tef f , tstress , L) (4.8) ∆Ion = f2 (Vef f , Tef f , tstress , W ) (4.9) The equations are already discussed in Sections 3.1.1 and 3.1.2. The drifts depend on the effective supply voltage over lifetime Vef f , the effective temperature over lifetime Tef f , the time tstress and the transistor sizes W and L. The time tstress states for how long the transistor is stressed due to an aging effect during the lifetime tlif e . The stress time can be expressed as: tstress = Pstress · tlif e 70 (4.10) 4.3. AgeGate: Aging-aware gate model 30 (falling inpuτ τransiτion) [%] 25 Inverτer delay degradaτion w.r.τ. parameτer drifτ 30 Simulaτion Sensiτiviτy 25 20 20 15 15 10 10 5 5 00 5 10 [%] 15 20 00 20 40 60 80 100 [mV] Figure 4.3.: Degradation of inverter delay by ∆Ion and ∆Vth , respectively. Solid lines show dependencies calculated with sensitivities and dotted lines show dependencies simulated on transistor level. Analyzing conditions are 27 ◦C, 1.2 V and 15 pF capacitive load. Pstress is the probability that a transistor is stressed during tlif e . Pstress differs for the individual transistors of a gate. The individual stress probability depends on the workload at the gate inputs and on the internal gate structure. A transistor must be negatively biased with respect to source and drain in order to degrade due to NBTI (see example in Figure 4.4). For transistor MP C this is the case when a logic “0” is applied to the input C. Hence, the stress probability of MP C just depends on the signal probability at C. For transistor MP B on the other hand, a logic “0” must be applied to B but also to input C. Otherwise the gate is not negatively biased with respect to the source node of the transistor. Hence, the stress probability for MP B depends on the workload at input B and C and in addition on the internal structure of the gate [Kumar et al., 2007b]. More precisely it depends on the position of the transistor in the PMOS stack. The challenge to determine individual transistor drifts is to obtain the stress probabilities for all the transistors a gate consists of by means of the values for SP and T D at the gate inputs and the internal gate structure. 4.3.3. Calculation of Stress Probabilities To calculate the parameter drifts of a transistor, the stress probabilities Pstress,N BT I and Pstress,HCI have to be obtained. Generally applicable methods, which can easily be automated, are presented to calculate the two stress probabilities for every transistor of a gate. 71 4. Aging-aware static timing analysis SP = 0.5 C SP = 0.4 B SP = 0.3 A MP C MP B MP A Z MN C MN B MN A Figure 4.4.: NOR gate with three inputs Stress Probability for NBTI A PMOS transistor M is in stress condition when it is in inversion. Hence, M degrades due to NBTI when its gate terminal is negatively biased with respect to its source and drain terminals. This can be expressed by the following two conditions: A: logic “0” applied to the gate terminal of M B: logic “1” applied to the source or drain terminal of M For the calculation of the stress probability for HCI, which is introduced in Section 4.3.3, the probability that an NMOS transistor is conducting is needed as well. Due to that, a new probability Pon is introduced. For a PMOS transistor, Pon is the probability that the gate terminal is at logic “0”, given by 1 − SP at the gate terminal. For an NMOS transistor, Pon is the probability that the gate terminal is at logic “1”, which equals SP at the gate terminal. Hence, the probability for Condition A is equal to Pon of M : P (A) = Pon,M (4.11) For NBTI the gate terminal must be negatively biased with respect to its source and drain terminal. However, Condition B just considers the logic value at one of both transistor terminals. The reason is that when the transistor is conducting (condition A fulfilled) it is enough to have a logic “1” at the source (drain) terminal, since the drain (source) terminal will be charged to the same value. Condition B is fulfilled if a conducting path exists between the supply voltage VDD and the source or drain terminal [Stempkovsky et al., 2009] of the transistor M . Hence, all PMOS transistors along the conducting path must have a logic “1” applied to their gate terminals as well. There might be multiple paths from VDD to the source or drain terminal of a transistor. In this case P (B)i is calculated separately for every path P AT HN BT I,i : P (B)i = Y t∈P AT HN BT I,i 72 Pon,t , if signals are independent (4.12) 4.3. AgeGate: Aging-aware gate model C B B' Figure 4.5.: Example explaining the signal dependence. P AT HN BT I,i is the set of all transistors along a conducting path. How those paths are determined is explained in Section 4.4.2. For independent signals at the gate inputs, the probabilities can simply be multiplied. The overall probability P (B) is the probability that at least one path is conducting1 : P (B) = 1 − ( (1 − P (B)i )) Y (4.13) i However, if the signals are dependent, this has to be taken into account when the probability for condition B is calculated. To calculate P (B) for transistor MP A (see Figure 4.4), P AT HN BT I consists of MP B and MP C . Both gate terminals have a signal probability of 0.5. If the signals are independent, P (B) would be 0.5 · 0.5 = 0.25. If the signals B and C are dependent (see Figure 4.5) it is not that easy to calculate P (B) for transistor MP A . In the first case (signals C and B), both signals are never logic “0” at the same time, hence, both transistors will never be in inversion at the same time and therefore P (B) is 0. In the second case (signals C and B 0 ), the signals are always logic “0” at the same time and P (B) is 0.5. The larger the probability for Condition B, the larger is the transistor drift and the increase of the gate delay. A worst-case assumption for Condition B is that all transistors of a path tend to be in inversion at the same time. In this case, the minimum of the probabilities Pon for all transistors in P AT HN BT I,i limits P (B)i : P (B)i = min t∈P AT HN BT I,i (Pon,t ) , worst-case assumption if signals are dependent (4.14) If there is more than one path, a worst-case assumption for Condition B is that just one path is conducting at a time and the probabilities P (B)i can simply be added: P (B) = min( X P (B)i , 1) , worst-case assumption if signals are dependent (4.15) i When the workload should be considered for the aging analysis, a probabilistic method is used to compute the signal probabilities at the gate inputs. The probabilistic method assumes independent input signals and the dependence of reconvergent signals is lost as well. Hence, the worst-case assumption is used in order to have a conservative result. Finally, Pstress,N BT I is the probability that both conditions A and B are fulfilled: ( Pstress,N BT I = P (A ∧ B) = 1 P (A) · P (B) , if signals are independent min (P (A), P (B)) , if signals are dependent (4.16) This is 1 minus the probability that no path is conducting at all 73 4. Aging-aware static timing analysis For Pstress,N BT I it has also to be taken into account whether independent signals are assumed or not. For illustration, the probability Pstress,N BT I for transistor MP A in Figure 4.4 is calculated. For independent signals Pstress,N BT I = (1 − 0.5) · (1 − 0.5) · (1 − 0.3) = 0.175. Otherwise, Pstress,N BT I is the minimum of the three SP values of the transistors in the stack, hence, Pstress,N BT I = min(0.5, 0.5, 0.7) = 0.5. Stress Probability for HCI A transistor degrades due to HCI when carriers are accelerated and injected into the gate oxide. The required electric field along the channel exists when the transistor switches from its non-conducting (off) to its conducting (on) state. For an NMOS (PMOS) transistor this implies a rising (falling) signal transition at the gate terminal. Furthermore, the degradation depends on the charge that flows through a transistor. Only if all other transistors along a path P AT HHCI from supply voltage/ground to the cell output are in inversion, the output load is recharged. Otherwise only internal capacitances are recharged which are substantially smaller and neglected in the proposed approach. For HCI, two stress conditions have to be fulfilled for a transistor M : C: transition from off- to on-state at transistor M D: conducting path from supply voltage/ground to output load T D at the gate terminal of M is a measure for the number of transitions (Condition C). Furthermore, all other transistors along the path P AT HHCI,i must be in inversion to form a conducting path (Condition D): P (D)i = Y P (D) = Pon,t , if signals are independent t∈P AT HHCI,i \{M } min t∈P AT HHCI,i \{M } (Pon,t ) , worst-case assumption if signals are dependent Y (1 − P (D)i )) 1 − ( (4.17) , if signals are independent i min( X i P (D)i , 1) , worst- case assumption if signals are dependent (4.18) The considered transistor M itself is excluded from the path P AT HHCI , because M does not have to be in inversion. For P(D) it has to be distinguished between independent and dependent signals as well. To obtain Pstress,HCI , time tstress,HCI has to be computed first. The number of transitions from off- to on-state during the whole lifetime is T D/2·fCLK ·tlif e , with fCLK being the clock frequency. This number multiplied by P (D) is the number of effective 74 4.3. AgeGate: Aging-aware gate model C B A SP = 0.5 SP = 0.4 SP = 0.3 MP C MP B MP A int MN C MN B MN A MP Z Z MN Z Figure 4.6.: OR gate with three inputs and an internal signal int. transitions. The number of effective transitions times the input slope is tstress,HCI . Hence, Pstress,HCI is: Pstress,HCI = tstress,HCI = T D/2 · fCLK · P (D) · sIN tlif e (4.19) Multi-stage gates The aging-aware gate model, described so far, is capable of determining the aged gate performances of single-stage gates. Single-stage gates have no internal nets that are connected to gate terminals of transistors. Examples for single-stage gates are inverters, NAND and NOR gates. But a simple buffer is already a multi-stage gate, because the two inverters are connected via an internal net. For multi-stage gates the following problems arise when the stress probabilities are calculated [Lorenz et al., 2010d]: 1. The values for SP and T D of internal signals (e.g., int in Figure 4.6) are unknown. These values are necessary to calculate Pstress,N BT I and Pstress,HCI for the transistors MP Z and MN Z . 2. The transition time sIN of internal signals is unknown as well. sIN is required to compute Pstress,HCI with Equation 4.19. To obtain the statistical signal properties the probabilistic method from [Najm, 1991] is used. Probabilistic methods can not just be used to propagate SP and T D from the gate inputs to the gate output but also to propagate them to internal nets. To do so, the logic function of the internal signal is determined when the gate is characterized. The transition time sIN of internal signals needed in Equation 4.19 is obtained during the characterization of the gate. Like the output slope sOU T , it is characterized dependent on input slope at the gate input and output load at the gate output. Consideration of temporal variation of temperature and voltage So far, an identical temperature Tef f and supply voltage Vef f for all gates and over the entire lifetime are assumed. In Section 3.1.1 it is discussed how temperature and voltage 75 4. Aging-aware static timing analysis Percentage of lifetime 20 % 60 % 20 % Temperature 125 ◦C to 150 ◦C 60 ◦C to 85 ◦C −20 ◦C to 27 ◦C Table 4.1.: An example for a temperature profile. The lifetime is 10y and Vef f is Vnom . differences across the chip can be taken into account by having an individual Tef f and Vef f value for every gate. In this section two methods are proposed to determine the parameter drifts when Tef f and Vef f change during the lifetime. Hence, there exists a temperature and/or voltage profile. Such a profile has to be defined in the specifications for a circuit. A temperature profile could for instance look as shown in Table 4.1. To ensure a conservative result, the upper bounds of the temperature intervals are taken. This results in the following time-temperature-tuples (ti , Ti ): (2 y, 150 ◦C), (6 y, 85 ◦C), (2 y, 27 ◦C) (4.20) The first proposed method just works for temperature profiles. The basic idea is to determine the effective temperature Tef f that results in an equivalent drift as the temperature profile over the same time. In both degradation equations for NBTI and HCI the temperature dependence is given by the Arrhenius equation: ∆Vth , ∆Ion ∝ e − kEaT (4.21) b Ea is the activation energy (e.g., 0.16 eV for NBTI) and kb is the Boltzmann constant. The time dependence is modeled for both effects as follows: ∆Vth , ∆Ion ∝ tn (4.22) With n being a constant (e.g., 0.23 for NBTI). First, an arbitrary reference temperature Tref is chosen and the times of the time-temperature-tuples are adjusted in a way that the degradation stays the same (ti , Ti ) → (ti,ref , Tref ): tni · e a − kET b i ! = tni,ref · e Ea b Tref −k (4.23) Solving Equation 4.23 for ti,ref results in: ti,ref = ti · e a( 1 − 1 ) −E k T T ef b i 1/n r (4.24) When this is done for the example above and NBTI, the following times are calculated: t1,ref = 2 y, t2,ref = 0.19 y and t3,ref = 7 h (4.25) t1 equals t1,ref , because T1 was chosen as the reference temperature. It can be seen that the first tuple with the high temperature dominates the degradation (the drift after 76 4.4. Characterizing the standard cells 2 y at 27 ◦C equals the drift after 7 h at 150 ◦C). Because of the identical reference temperature, the times can now be added: ttot = n X (4.26) ti,ref i=1 The tuple (tlif e , Tef f ) is calculated from the tuple (ttot , Tref ) by setting ttot to tlif e and adjusting the temperature that the drift stays the same: Tef f = 1 Tref kb · n · ln − Ea ttot tlif e !!−1 (4.27) In the example the effective temperature is 119 ◦C. This results in a threshold voltage drift of 50 mV due to NBTI. The second method works for temperature as well as voltage profiles. The drift for every time interval is first calculated separately and then the drifts are combined. In the example above the following threshold voltage drifts ∆Vth,i can be computed: (2 y, 150 ◦C) : ∆Vth,1 = 49 mV (6 y, 85 ◦C) : ∆Vth,2 = 28 mV (2 y, 27 ◦C) : ∆Vth,3 = 7 mV The drifts cannot simply be added because the nonlinear time dependence has to be taken into account: 1/n 1/n 1/n n ∆Vth = ∆Vth,1 + ∆Vth,2 + ∆Vth,3 (4.28) This method also results in a degradation due to NBTI of 50 mV. 4.4. Characterizing the standard cells The completely automated characterization of the standard cells collects all the information required for an aging analysis on gate level. For a traditional STA using a LUT-based delay model without considering aging, the fresh delay and output slope for every timing arc are required. Delay and slope are stored in two-dimensional LUTs dependent on input slope and output load. To calculate aged gate performances additional information is necessary for AgeGate: • Sensitivities ∂q/∂∆p of the gate performances with respect to a parameter drift for the canonical gate model (Equation 4.5). • The conducting paths P AT HN BT I,i and P AT HHCI,i for all transistors of a gate are required. This information enables the calculation of the probabilities P (B) (Equation 4.13) and P (D) (Equation 4.18). • The logic function and signal slope for all internal signals of multi-stage gates. These are required to calculate the stress probabilities. 77 4. Aging-aware static timing analysis The determination of sensitivities and paths is discussed in the next subsections. The logic function is obtained by a structural recognition algorithm developed at the EDA institute at TUM. The algorithm is based on a structural recognition algorithm for analog circuits [Massier et al., 2008]. It analyzes the pull-up as well as the pull-down network of the single gate stages and generates the logic function of all internal nodes and the output node. The slope of internal signals is determined when the delay and output slope of the gate is characterized. It is stored in two-dimensional LUTs dependent on gate input slope and output load. 4.4.1. Obtaining the sensitivities The sensitivities for the canonical gate model are obtained together with the fresh gate performances qf resh . The adjoint sensitivity analysis [Pillage et al., 1995, chap. 9], integrated in the SPICE simulator, is used for this purpose. It is a very efficient approach, much faster than using finite differences for the sensitivities. For NBTI the sensitivity of q with respect to a drift of the threshold voltage ∆Vth is required: χq∆Vth ,n = ∂q ∂∆Vth,n (4.29) For HCI the sensitivity of q with respect to ∆Ion is needed. Unfortunately, it can not be determined directly by means of the adjoint sensitivity analysis, because ∆Ion is, unlike ∆Vth , not a transistor parameter. But there is an equivalent circuit for a degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit can be used to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift ∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a current controlled current source IDeg realizes the mobility degradation. This equivalent circuit can be used to calculate the sensitivity χq∆Ion by means of the chain rule: χq∆Ion ,n = ∂q ∂q ∂VDeg,n ∂q ∂IDeg,n = + ∂∆Ion,n ∂VDeg,n ∂∆Ion,n ∂IDeg,n ∂∆Ion,n (4.30) The partial derivatives ∂q/∂VDeg and ∂q/∂IDeg are obtained by replacing all transistors by their equivalent circuit using the adjoint sensitivity analysis. The remaining partial derivatives can be derived from the equations for VDeg and IDeg . 4.4.2. Obtaining the internal gate structure The internal gate structure determines P AT HN BT I and P AT HHCI , which are necessary for calculating P (B) (Equation 4.13) and P (D) (Equation 4.18). P AT HN BT I is a path from the source or drain terminal of a considered PMOS transistor to the supply voltage. For the OR3 gate in Figure 4.6, P AT HN BT I for transistor MP C consists of the transistors MP A and MP B . To determine P AT HN BT I , just the pull-up network is considered. A breadth-first search, starting at the source terminal of the considered transistor, is performed to find VDD . It is important that the search algorithm does not stop when VDD is reached, because there might be more than one 78 4.4. Characterizing the standard cells MP B B A MP A MP C C Z MN A MN B MN C Figure 4.7.: Complex gate implementing the logic function z = a · (b + c). path in a gate. An example for multiple paths is the complex gate with the logic function z = a · (b + c) shown in Figure 4.7. There exist two paths for transistor MP C . If transistor MP B is in on-state, then the source of MP C is connected to VDD . Hence, P AT HN BT I,1 consists just of transistor MP B . The second path P AT HN BT I,2 consists of transistor MP A , since the drain MP C is connected to VDD if MP A is conducting. P AT HHCI is required for HCI. It leads from VDD /ground along the considered transistor to the output of the gate. P AT HHCI is determined by performing two breadth-first searches. For a PMOS transistor, again the pull-up network is taken into account and for an NMOS transistor the pull-down network. The first search starts at the source terminal of the considered transistor and looks for VDD or ground, respectively. And the second search starts at the drain terminal of the considered transistor and looks for the gate output. It is again possible to have multiple paths. In the example with the complex gate (Figure 4.7) two paths exist for the transistor MP A . The first path consists of the transistors MP A and MP B and the second path consists of the transistors MP A and MP C . To calculate P (B) and P (D) during aging analysis, all paths P AT HN BT I,i and P AT HHCI,i for all transistors of a gate are already determined during the characterization of the gate and stored in the gate model. 4.4.3. Simplification of the gate model This section is about reducing the gate model by removing LUTs, which can be neglected because they have (almost) no effect on the aged gate performance. The advantages of such an simplified gate model are that it needs less storage space and the aging analysis is accelerated, because when the sensitivity is removed the corresponding parameter drift does not have to be calculated as well. There are four LUTs for every timing arc in a traditional LUT-based gate model (one LUT for delay/output slope for rising/falling input transition). A NAND gate with two inputs, for instance, has eight LUTs. AgeGate has additional LUTs for the sensitivities χqm,p and the signal slopes of in- 79 4. Aging-aware static timing analysis ternal nets. For every nominal LUT one sensitivity per NMOS transistor χqm,∆Ion and two sensitivities per PMOS transistor (χqm,∆Vth and χqm,∆Ion ) are required. Hence, the AgeGate model for a two input NAND gate has 56 LUTs and an OR gate with four inputs has even 408 LUTs. LUTs of sensitivities that have (almost) no impact on the degradation of the gate performance can be removed. This impact is given by: ∆q m,p = χqm,p · ∆pm (4.31) For every sensitivity it is checked whether ∆q m,p is smaller than a specified limit. For this purpose the drift ∆pm must be specified as well. For instance, the threshold voltage drift of a PMOS transistor has no noteworthy impact on the gate delay for a rising input change, because in this case the pull-down network has to recharge the output load. If 0.1 % of the nominal gate performance is chosen as the limit and the drifts are 100 mV for ∆Vth and 20 % for ∆Ion (these drift values are much larger than what can be observed in reality), the LUTs of the NAND gate are reduced from 56 to 39 and the LUTs for the OR gate are reduced from 408 to 168. 4.5. Results 4.5.1. Waveform dependence of parameter drift Transistor parameter drifts and aged signal slopes are mutually dependent. A small experiment should show, whether it is justified to calculate the parameter drifts in the proposed approach from fresh output slopes or if an iterative approach is beneficial. For this purpose a NOR2 ring oscillator is simulated with RelXpert (65 nm LP, 1.7 V, 145 ◦C , 700 h). In a first run, the fresh waveforms are used to degrade the transistors. In a second run, the aged waveforms after 700 h are used. The aged waveforms are obtained by simulating the degraded ring oscillator from the first run. The truth should be in between those two simulations, since in reality the waveform would degrade continuously within the 700 h affecting the parameter drift and the drift, vice versa, affecting the signal waveform. The degradation of the oscillator frequency is 5.35 % for fresh slopes and 5.43 % for aged slopes (see Figure 4.8). An iterative approach would give a value in between. Hence, there is no significant advantage of an iterative approach. This can be explained by the fact that NBTI is a static effect and the slope of the waveform has no impact on the degradation caused by it. Only the degradation caused by HCI is dependent on the time the signal is in transition. However, as it can be seen later in Section 4.5.3 NBTI is the dominant aging effect. 4.5.2. Comparison of AgeGate, circuit-level simulation and measurements Before analyzing the ISCAS’85 test circuits, the accuracy of AgeGate is investigated. In Figure 4.9 the degradation of a ring oscillator is determined by measurement, simulation on circuit level and the proposed aging analysis approach. For the simulation on 80 4.5. Results Figure 4.8.: Ring oscillator waveforms of fresh (leading waveform in magenta) and aged (shifted waveforms in red and blue) simulations. The transistor drifts for the aged simulations were determined once by the fresh waveform and the aged waveform. Independent of which waveform was taken to determine the drifts, the aged waveforms are almost indistinguishable transistor level, the transistors in the transistor level netlist are replaced by the equivalent circuits (see Figure 3.12(a)), the same parameter drifts as for the aging analysis on gate level are applied and a SPICE simulation is performed. The upper diagram shows the degradation when the device under test did not oscillate during stress. During this static stress the device is only affected by NBTI. In the lower diagram, the device was oscillating during stress. This time both aging effects are relevant. Simulation and aging analysis match quite well. Measurement results were only available for the upper case. The results show a mismatch compared to the aging analysis and the simulation. It can be assumed that a large part of the error is caused by inaccurate degradation equations. The degradation determined with the proposed aging analysis is a bit smaller than the simulated degradation on transistor level. This can be explained by the linearization of the sensitivities. As it can be seen on Figure 4.9, the degradation calculated with linearized sensitivities is smaller than the degradation simulated on transistor level. 4.5.3. Aging analysis results For evaluation purposes, an industrial 90 nm standard cell library is characterized. The following use profile was chosen for the aging analysis: a lifetime of 10 y, a temperature Tef f of 125 ◦C, and a supply voltage VDD of 1.32 V. Figure 4.10 shows how the arrival times at the primary outputs of the benchmark circuit c880 increase over lifetime. SP and T D values are determined by the probabilistic 81 frequency degradation [%] frequency degradation [%] 4. Aging-aware static timing analysis no oscillation during stress 6 5 4 3 2 1 0 12 10 8 6 4 2 0 oscillation during stress measurement simulation aging analysis 5h 144h stress duration 500h Figure 4.9.: Frequency degradation of a 65 nm inverter ring oscillator stressed for 500 h at defined stress conditions. method for SP = 0.2 and T D = 0.2 at the primary inputs. The figure indicates that it is not enough just to consider the most critical nominal path during aging analysis because the order of the arrival times can change over lifetime (signals 866 and 874). It is difficult to compare AgeGate to the different state-of-the-art aging-aware gate models, because the published results are based on different technologies. Hence, especially the degradation equations are different. Instead, it is shown how the accuracy of the aging analysis is increased by the special features of the proposed aging-aware gate model. The special features are: consideration of NBTI and HCI, computation of aged output slopes and calculation of individual parameter drifts. In Table 4.2 the path delay degradation ∆delay of the critical path is depicted for a worst-case analysis of the ISCAS’85 benchmark circuits. The nominal path delays without aging (NOM) are given as a reference. The degradation due to both effects (BOTH) as well as due to just one effect (NBTI, HCI) is analyzed. When both effects are considered, the degradation of the critical path delay is between 12.0 % and 15.4 %. The dominant aging effect for this technology and the chosen use profile is NBTI, with a performance degradation of up to 12.3 %. In the last column (NO_SLP) values for ∆delay are given if no aged output slope is computed. By comparing ∆delay with and without considering the aged output slope, it can be seen that not considering aged output slopes results in an underestimation of the degradation by 24 % on average. For the column BOTH also the run time on an Opteron workstation with 2.4 GHz and 2 GB RAM is given in parenthesis. It can be seen that the proposed model can be evaluated quickly. For the diagram in Figure 4.11 an aging analysis with individual transistor drifts is 82 4.5. Results Figure 4.10.: The five slowest output arrival times over lifetime for ISCAS’85 circuit c880. Individual workloads for the gates were obtained for SP = 0.2 and T D = 0.2 at primary inputs. Signals 866 and 874 change order with time. NOM [ns] HCI [%] NBTI [%] BOTH [%] ([s]) NO_SLP [%] c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 0.18 2.30 1.51 1.88 1.81 2.50 2.87 3.45 3.12 8.88 2.61 4.7 4.0 3.8 3.4 4.1 3.2 2.6 2.7 2.6 1.7 3.0 8.4 10.9 11.3 8.5 9.8 10.2 10.1 9.9 10.0 12.3 9.7 13.0 15.4 15.2 12.0 13.4 13.8 12.8 13.0 12.8 14.2 12.9 (2.57) (4.19) (6.52) (6.62) (8.61) (9.96) (17.86) (19.61) (26.97) (29.35) (34.07) 9.8 11.1 11.4 10.0 10.2 10.0 10.2 10.2 10.4 9.0 9.9 Ø 2.83 3.2 10.1 13.5 (15.12) 10.2 Table 4.2.: Degradation of critical path delays for different analyzer settings. 83 4. Aging-aware static timing analysis Figure 4.11.: Comparison of analysis with and without individual transistor drifts. compared to an aging analysis where it is assumed that all transistors of a gate degrade as much as the worst amongst them. The diagram shows the benefit of calculating individual transistor drifts. Without individual transistor drifts the mean degradation is overestimated by 20 %. 4.6. Summary An aging analysis flow on gate level capable of determining the impact of the two dominant drift-related aging effects on circuit timing was introduced. The developed agingaware gate model, AgeGate, consists of a canonical gate model, technology specific degradation equations, and information about the internal gate structure. What distinguishes AgeGate from existing aging-aware gate models is that it considers the aged output slope, it takes NBTI and HCI into account, and it calculates individual transistor drifts. The results show that both aging effects are relevant, not calculating an aged output slope underestimates the performance degradation by 24 %, and not computing individual transistor drifts overestimates the degradation by 20 %. 84 5. Identifying possible critical paths in aged circuits When the operating conditions over lifetime and the individual workloads of the gates are known, the degraded circuit delay and the critical path causing this delay can be determined by the aging-aware timing analysis described in Chapter 4. If the operating conditions and workload are not (exactly) known, just a worst case analysis can be performed (see Section 4.1). Due to this uncertainty, multiple possible critical paths (PCPs) may exist. This chapter is about identifying these PCPs. A PCP is a path that is the critical path of a circuit for a specific combination of operating conditions and workload. However, it would be too complex and inefficient to identify PCPs with this definition. Hence, a weakened definition is used instead: A possible critical path (PCP) is a path that cannot be excluded from the paths that become the critical path of a degraded circuit for a specific combination of temperature Tef f , supply voltage Vef f , workload of the input signals and lifetime tlif e . This definition reflects how PCPs are determined: Those paths are identified that are for sure no PCPs and the remaining paths are considered as PCPs. Several applications arise from knowing the PCPs: • The TG of a combinational circuit can be reduced until it just contains PCPs. This reduced TG can be used as a timing model for modules, such as adders or multipliers. Since such a timing model is generated once and can be used whenever the module is instantiated in a more complex hierarchical design, it accelerates the aging-aware timing verification of complex digital circuits compared to an analysis on gate level. • PCPs can also be utilized to monitor a system during its lifetime. The delay of the PCPs is determined in periodic intervals and countermeasures are taken if the path delay is no longer within the safe operating range. Such an adaptive system can react, for instance, by reducing the clock frequency of the aged circuit. • PCPs are also beneficial for optimizing a circuit to minimize the circuit performance loss due to aging. Existing optimization approaches [Wu and Marculescu, 2009; Wang et al., 2009a,b; Bild et al., 2009] depend on knowing in advance the gates that degrade the most. Hence, the operating conditions and the workload of a circuit must be known. If the operating conditions and the workload are unknown, the PCPs can help to optimize a circuit nevertheless. They yield the information which gates might become critical. By combining this information and the information which gates have a huge impact on circuit performance, those gates can be identified that should be protected from excessive degradation. 85 5. Identifying possible critical paths in aged circuits • Finally, PCPs are required by already published papers. Chen et al. [2011] propose a path-based aging-aware timing analysis. Wang et al. [2008] introduce node criticality computation. By protecting the identified critical nodes the delay degradation can be reduced. Both approaches need PCPs. However, they just take the upper X % (e.g., 10 %) of the paths with the longest aged delay. This is quite inaccurate and the number of PCPs could either be over- or underestimated. The remainder of this chapter is organized as follows: The next section describes prerequisites for the proposed approach. Then the method to identify the PCPs is introduced (see Section 5.2). This method is extended in Section 5.3 by considering that some paths must degrade even if the workload is unknown. In Section 5.4 it is described how process variations and variations of the operating conditions can be considered as well. Section 5.5 introduces two applications of PCPs, the generation of aging-aware timing models and the benefit of PCPs for testing aged circuits. Results follow in Section 5.6 and the chapter is summarized in Section 5.7. 5.1. Prerequisites Without exact operating conditions and workloads, the degraded gate delay cannot be exactly determined. However, it is possible to determine an interval for the gate delay. The lower bound is the fresh gate delay (df resh ), since aging always increases the gate delay. The upper bound is the maximal aged gate delay (daged ). To determine the upper bound, a validity region must be defined by specifying maximal values for the effective temperature Tef f , the effective supply voltage Vef f and the lifetime itself. The foundation to identify the PCPs is a timing graph as described in Section 2.1.2. However, this time the edge weights, which are given by the gate delays, are not deterministic quantities but intervals. For this reason, all other timing quantities (q) (e.g AT, D2S, SLACK, . . . ) are intervals as well. The intervals are stored as tuples. The first element of the tuple is the fresh value qf resh and the second element is the aged value qaged . An example for a TG with annotated nodes and edges is given in Figure 5.1. The timing quantities that are stored at the nodes and edges can change during the computation of the PCPs because elements of the timing graph that do not belong to a PCP are removed. An incremental timing analysis, as described in Section 2.1.3, is used to update the changed timing quantities whenever they are read. Every tuple has a valid flag. When the timing quantity is read, it is first checked whether it is valid. If not, the timing quantity is updated, stored and the valid flag is set again. 5.2. Identification of PCPs The timing graph is annotated with tuples for all required timing quantities. These tuples represent the intervals the timing quantities are in for the specified validity region. The reduction steps introduced in this section determine the PCPs. The following interval 86 5.2. Identification of PCPs Figure 5.1.: TG annotated with arrival time and delay to sink at every node. operations are required for that purpose: sum(a, b) = sum([af resh , aaged ], [bf resh , baged ]) := [af resh + bf resh , aaged + baged ] (5.1) max(a, b) := [max(af resh , bf resh ), max(aaged , baged )] (5.2) a < b := aaged < bf resh (5.3) Operations for subtraction, min and greater than (>) can be defined correspondingly. Even though the goal is to determine the PCPs, it is crucial that the reduction steps do not depend on enumerating every single path in the timing graph and decide whether it is a PCP or not. This would make it impossible to determine the PCPs for circuits of industrial relevance, since the number of paths increases exponentially with the number of nodes. Two criteria are used to determine whether a path is a PCP or not: Criterion 1: A path must have a maximal aged path delay Daged greater than the critical path delay of the fresh circuit D(Pcrit )f resh (or just Dcrit,f resh ). Otherwise it is not a PCP because Pcrit will always have a greater path delay. Criterion 2: Even if a path A has an aged path delay greater than Dcrit,f resh , it might not be a PCP. If there is another path B that has a greater path delay than A for all possible operating conditions and workloads, path A is not a PCP. 5.2.1. Slack reduction step The slack reduction step checks the first criterion. A positive slack (see Equation 2.6) at a node n indicates that the signal arrives soon enough at n to arrive at T before the specified required time REQT(T ). By setting the required time at T to Dcrit,f resh , a node n with a positive aged slack (SLACK(n)aged ) indicates that all aged paths through n arrive at T before Dcrit,f resh . Hence, no path through this node n is a PCP and the node can be removed from the timing graph. This is checked for all nodes of the timing graph. 87 5. Identifying possible critical paths in aged circuits Algorithm 3: Slack reduction step /* Remove nodes with positive aged slack foreach node in TG do if SLACK(node)aged > 0 then clean_remove_node(node) end end */ The slack reduction step (see Algorithm 3) has a time complexity of O(n), with n being the number of nodes in the timing graph. The nodes are not simply removed from the timing graph, instead the function clean_remove_node (on page 94) is called. This function checks if additional nodes and edges can be removed from the timing graph and assures that the remaining graph is a valid TG (see Section 5.2.6). 5.2.2. Path delay reduction step The second reduction step also checks whether the aged delay of a path is less than Dcrit,f resh (Criterion 1). This time it is checked whether an edge and not a node can be removed. The largest path delay from S to T along a given edge (u, v) can be calculated as follows: D = AT(u) + d((u, v)) + D2S(v) (5.4) AT(u) gives the maximal delay of all paths to the node u, d((u, v)) is the edge delay and D2S(v) is the maximal delay of all paths from v to T . If the path delay interval D is less than Dcrit , this edge can be removed, because no aged path through this edge is slower than Dcrit,f resh . This is checked for all edges of the timing graph. An example is given in Figure 5.2. The path delay interval of path P is calculated with Equation 5.4 to [6, 9]. The delay of the critical fresh path is [10, 12]. Hence, P is not a PCP and the edge (b, d) can be removed. This reduction step has a time complexity of O(e) with e being the number of edges in the timing graph. The pseudo code is given in Algorithm 4. 5.2.3. Arrival time reduction step This reduction step checks for conditions described in Criterion 2. The arrival time at a node v is determined by computing the arrival times along all incoming edges and calculating the maximum of them. If the arrival time interval along an edge (u, v) (AT(u)+d((u, v))) is smaller than the arrival time after the max-operation (AT(v)), then signals along (u, v) never determine the arrival time at v and the edge can be removed. This is done for all edges in the graph. This can also be explained by longest path intervals (see Figure 5.3). This interpretation is beneficial for the common edge reduction step discussed in Section 5.2.5. A longest path interval determines every fresh or aged arrival time (or delay to sink). Two longest path segments are given. Path segment V (dashed line) is the path to v that 88 5.2. Identification of PCPs Algorithm 4: Path delay reduction step /* Remove edges with max path delay along this edge lesser than required time foreach node in TG do foreach suc in successorsnode do /* checked edge is (node, suc) maxP athDelayOverEdge ← AT(node) + d((node, suc)) + D2S(suc) ; if maxPathDelayOverEdge < REQT(T ) then clean_remove_edge(node, suc) ; end end end */ */ Figure 5.2.: Illustration of path delay reduction step. Edge (b, d) can be removed because the delay of path P is less than the delay of path Pother . 89 5. Identifying possible critical paths in aged circuits Figure 5.3.: Illustration of arrival time reduction step. Edge (d, e) can be removed because arrival time interval along edge (d, e) is less than the arrival time at e after the max-operation. determines the fresh arrival time (AT(v)f resh ) and path segment U (solid line) is the path to v that determines the maximal aged arrival time at v along the edge (u, v). These path segments can easily be obtained because for each arrival time it is stored from which edge it results. If the path delay interval of segment U is less than the interval of segment V , then the edge (u, v) can be removed. The arrival time reduction step (see Algorithm 5) has a time complexity of O(e), with e being the number of edges in the timing graph. Algorithm 5: Arrival time reduction step /* Remove edges that do not contribute to atime at a node foreach node in TG do foreach pre in predecessors(node) do atimeOverP re ← AT(pre) + d((pre, node)) ; if AT(node) > atimeOverPre then clean_remove_edge(pre, node) ; end end end */ 5.2.4. Delay to sink reduction step This reduction step is almost equivalent to the arrival time reduction step. This time not the delay from S to a node is considered but the delay from the node to T (D2S). D2S is determined by computing the delay to T for all outgoing edges of a node u and computing the maximum of them. If the delay to T along an edge (u, v) is less than the delay to T at u, the edge can be removed (see Algorithm 6). 90 5.2. Identification of PCPs Algorithm 6: Delay to sink reduction step /* Remove edges that do not contribute to delay to sink at a node foreach node in TG do foreach suc in succsessors(node) do d2sinkOverSuc ← d((node, suc)) + D2S(suc) ; if D2S(node) > d2sinkOverSuc then clean_remove_edge(node, suc) ; end end end */ Figure 5.4.: Example for the common edge reduction step. 5.2.5. Common edge reduction step In Criterion 2 two paths are compared to each other. For paths which share common edges this comparison of two path intervals is too pessimistic as the following example illustrates (see Figure 5.4): Path V (dashed line) has a delay of [7, 13] and the delay of path U (solid line) is [4, 8]. V is a PCP and U is also considered a PCP because U is not slower than V (upper bound of U is not less than lower bound of V ). Path V and U have a common edge (a, b). For the calculation of the lower bound of path V all the fresh gate delays along the path are added and for the calculation of the upper bound of path U all aged gate delays are added. This means that for the common edge (a, b) there is once assumed the upper bound of the gate delay and once the lower bound. This is impossible. Although the actual gate delay is unknown during the PCP identification, a timing arc must have the same delay independent of the path that is investigated. By assuming an identical delay for common edges (the aged delay of common edges is set to fresh edge delay resulting in a fixed edge delay and not an interval), the new path delays are: D(V ) = [7, 10] and D(U ) = [4, 5]. The path delay interval of U is now less than the interval of V , hence, U is not a PCP. This example shows that whenever two paths, that share common edges, are compared an identical edge delay has to be assumed for common edges in order not to be overly pessimistic. 91 5. Identifying possible critical paths in aged circuits But how to take common edges into account? An exact method is the following: First, all paths, which have an aged path delay slower than Dcrit,f resh , are enumerated. Then, the delays of two paths are compared. For common edges, the fresh edge delay is assumed for the aged edge delay as well. If one path interval is less than the other interval, this path is not a PCP and can be removed. This exact method has an exponential time complexity and cannot be used for complex circuits. Baba and Mitra [2009] propose a more efficient method to consider common edges. This method extends the arrival time reduction step. Hence, it is block-based not pathbased like the exact method. The arrival time reduction step removes an edge (u, v) if the arrival time at v along this edge is less than the resulting arrival time at v after the max-operation. As shown in the arrival time reduction step, this can be interpreted as comparing two path segments. The longest path segment V consists of all edges that determine the fresh arrival time at v after the max-operation and the longest path segment U determines the aged arrival time at v along edge (u, v). If V and U have common edges, then the same edge delay for common edges must be assumed. However, just adding up the updated edge delays along the path segments is not enough. By changing the common gate delays, U and V itself could have changed. In the example (see Figure 5.4) by setting the aged delay of (a, b) to 2, the aged arrival time at b is now determined by the second incoming edge (f, b). In [Baba and Mitra, 2009] this is solved by setting the aged delay of common edges to the fresh value and running the STA again to determine the changed arrival times. Hence, whenever common edges are detected the STA is performed again with changed edge delays to decide whether an edge can be removed or not1 . This takes a lot of time as can be seen from the results of [Baba and Mitra, 2009]. In the proposed approach it is not necessary to run the STA again. This is possible because the join-slacks (see Section 2.1.5) indicate how far the gate delay can be decreased before the arrival time at a node is determined by another edge. In the example, the aged join-slack between edge (a, b) and edge (f, b) indicates that when the aged gate delay is reduced by more than 2 time units, the arrival time is determined by edge (f, b). For the path segment U two path delay intervals are calculated; the path segment delay D(U ) when common edges are not considered and the path segment delay D(U ) when for common edges the fresh gate delay is used. To decide whether an edge (u, v) can be removed, the following cases have to be distinguished (see Figure 5.6): 1. If D(U ) < D(V ), then remove common edge: Even without considering common edges, the edge (u, v) can be removed. 2. If D(U ) not < D(V ), then do not remove common edge: The edge cannot be removed, because even if the fresh delay is used for common edges the path delay is still too large. 3. Else (D(U ) < D(V ) and D(U ) not < D(V )), it depends on U 0 : If Vf resh is between D(U )aged and D(U )aged , it depends on the path segment U 0 1 After that the common edge delays have to be reset and the STA has to be run once more to get back the original state of the timing graph. 92 5.2. Identification of PCPs with the next smaller delay than U whether (u, v) can be removed or not. a) If D(U 0 ) < D(V ), then remove common edge: This is like case 1. From U it cannot be decided whether (u, v) can be removed but from U 0 . b) If D(U 0 ) not < D(V ), then do not remove common edge: This is like case 2. From U it cannot be decide whether (u, v) can be removed but from U 0 . c) If D(U 0 ) < D(V ) and D(U 0 ) not < D(V ), it depends on next U 0 : Like case 3. Look at the U 0 with the next smaller delay. d) If no U 0 , then remove common edge: If there is no path segment U 0 with a smaller delay than U , then the delay D(U ) can be assumed and the edge is removed. The pseudo code for this reduction step is given in Algorithm 7. The only difference compared to the Algorithm 5 for the arrival time reduction step is the function edge_can_be_removed, which checks for the different cases given above that have to be considered. Algorithm 7: Common edge reduction step /* Remove edges that do not contribute to atime at a node (common edges considered) foreach node in TG do foreach pre in predecessors(node) do atimeOverP re ← AT(pre)+d((pre, node)) ; if edge_can_be_removed(pre, node) then clean_remove_edge(pre, node) ; end end end */ The delay to sink reduction step can be extended in a similar way to take common edges into account. In this case the branch-slacks are required to iterate over the path segments from a node to T . With the exact method more edges can be removed, because all paths that share common edges are compared to each other. The example in Figure 5.5 illustrates the difference: Let’s assume path A (solid line) and B (dashed line) are PCPs, but path C (dotted line) is not a PCP because the path delay interval of C is smaller than the interval of A when common edges are considered. However, path C is not slower than path B. When just the longest path segments at x are compared to each other, A and C are never compared and path C is not removed from the PCPs. Whenever case 3.c is detected, the next longest path segment U 0 must be determined. In order not to have a worst-case time complexity dependent on the number of paths 93 5. Identifying possible critical paths in aged circuits Figure 5.5.: Example that shows difference between proposed and exact method for common edges. in the timing graph, a maximal number N of paths U 0 that should be determined is specified. If the considered edge cannot be removed after N path segments are checked, then it is assumed that the edge cannot be removed. This way the worst-case time complexity just depends on the number of edges e in the timing graph, hence, O(e). 5.2.6. Removing edges and nodes When a reduction step detects that a node or an edge can be removed, often additional nodes and edges have to be removed as well from the graph to have a valid timing graph again. Whenever an edge (u, v) is removed by clean_remove_edge, it is checked whether the node u has any additional successors. If not, this node is removed as well by calling clean_remove_node. It is also checked whether the node v has any additional predecessors. Otherwise, v is removed as well by calling clean_remove_node. The function clean_remove_node removes not just the node itself, but additionally all the edges heading to or leaving from this node. By removing the edge (u, v) in Figure 5.3 the node u can also be removed, because u has no additional successors. When u is removed, all its incoming edges are removed as well, hence, (b, u) is removed. 5.3. Realistic aged path delays So far, intervals are used for the gate delay, because the specific delay of an aged gate is unknown (since operating conditions and workload over lifetime are unknown). An interval for the path delay is calculated by adding up the gate delays along the path. But is it really possible that along a path all gates degrade maximal (upper bound of path delay interval) or all gates do not degrade at all (lower bound of path delay interval)? 94 5.3. Realistic aged path delays U' u v U V V Path V Case 1: Remove edge (u,v) Path U Case 2: Not remove edge (u,v) Path U Case 3: Depends on U' Path U Case 3.a: Remove edge (u,v) Path U Path U' Case 3.b: Not remove edge (u,v) Path U Path U' Case 3.c: Depends on next smaller U' Path U Path U' Case 3.d: Remove edge (u,v) Path U no Path U' Figure 5.6.: Graphical representation of the common edge reduction step cases. Edge (u, v) can be removed if aged delay of path U is smaller than fresh delay of path V . 95 5. Identifying possible critical paths in aged circuits In this section it is investigated whether it is justified to use intervals for gate and path delays, or not2 . In the following, it is shown that intervals for the gate delay are justified. It is shown as well that the upper bound of the path delay interval is realistic as long as a given path is statically sensitizable. The lower bound of a path delay can also be reached as long as just one input transition is considered. But the lower bound of the interval is often too pessimistic if for a given path the maximum of the delays for a rising and a falling input transition are considered. This can be used to further reduce the number of PCPs. 5.3.1. Gate delay interval The degradation of a gate strongly depends on the workload. NBTI only degrades PMOS transistors. Hence, only the gate delay for a falling input transition degrades. For NBTI the workload impact is defined by the signal probability at the gate inputs. If SP is 0 at an inverter input, the inverter delay degrades maximal. On the other hand, if SP is 1, the delay will not degrade at all. A NOR or NAND gate also degrades maximal when SP at the inputs is 0 and does not degrade if SP is 1. Hence, it is justified to use an interval for the gate delay because the lower and the upper bound of the interval can be reached. 5.3.2. Realistic aged path delays for an inverter chain Before investigating a general path, let’s have a look at an inverter chain. Figure 5.7 shows the dependence of the delay of an inverter chain on the signal probability SP IN at the input IN. The aged path delays for a rising input transition D(Pr )aged and a falling input transition D(Pf )aged are shown (solid lines). Pf degrades the most, when SP IN is 0. Then, SP at all gates with a falling input transition is 0 and the gates degrade maximal. On the other hand, if SP IN is 1, the aged path delay is the delay of the fresh inverter chain, because SP is 1 for all gates with a falling input transition. For Pr it is the exact opposite. No degradation when SP IN is 0 and maximal degradation when SP IN is 1. The path delay of interest is the maximum of the path delays for a rising and a falling transition: max(D(Pr )aged , D(Pf )aged ) (5.5) 2 For the investigation of the realistic aged path delay, it is assumed that just the workload is unknown and (at least lower bounds for) Tef f , Vef f and tlif e are known. Otherwise the lower bound of the path delay interval is of course equal to the fresh path delay because a lifetime of 0 could be assumed if tlif e is unknown. At the moment just NBTI is considered. The reason is that NBTI depends on the static signal probability and, although, the actual SP is unknown, it must be between 0 % and 100 %. Furthermore, the results in Chapter 4 show that NBTI is the dominant aging effect. Nevertheless, it should be possible to consider HCI as well. To consider HCI, an upper bound for TD has to be defined. For a glitch free circuit, for instance, the upper bound for TD is 1. 96 5.3. Realistic aged path delays input transition IN OUT Figure 5.7.: Path delay of an inverter chain (10 inverters) with respect to SP at the input. The inverter chain can still degrade maximal (for SP IN = 1 or SP IN = 0). However, it is no longer possible that the path does not degrade at all. The gate delays for one transition do not degrade when the delays for the opposite transition degrade the most and vice versa. The inverter chain now degrades minimal for SP IN = 0.5 (intersection of solid lines), but the minimal degradation is already 85 % of the maximal degradation. 5.3.3. Maximal aged path delay of a general path A (general) path (see Figure 5.8) consists of an input, an output, the gates along the path and side inputs. Side inputs are gate inputs that are not on the path. Just single staged gates (e.g., inverter, NAND and NOR) are considered. This is no limitation, because complex gates are set up from those basic gates. The signal probabilities of the gates along a path are interdependent. They depend on the logic interconnection and the SP at the PIs. Like an inverter chain, a general path can degrade maximal if it can be sensitized statically. A path is statically sensitizable if at least one input vector exists that sets all side inputs of the gates to their non-controlling value. A path can be specified by the nodes along the path in the timing graph (0, 1, . . . , m). 97 5. Identifying possible critical paths in aged circuits 1 0 Figure 5.8.: A general path Hence, the input and output of a gate are two consecutive nodes i and i+1. fi denotes the logic function dependent on the primary inputs of a node i is fi . The static sensitization condition of a logic gate is given by the Boolean difference: ∂fi+1 = fi+1fi ⊕ fi+1f i ∂fi (5.6) The sensitization condition specifies the input vectors for which a transition at the gate input propagates to the gate output. A path is statically sensitizable if all the gates along the path fulfill the sensitization condition: m−1 Y i=0 ∂fi+1 =1 ∂fi (5.7) If a path is statically sensitizable it behaves like an inverter chain and it is possible that SP is 0 for every on-path gate input with a falling input transition. For a NOR gate the non-controlling value is logic “0”. Hence, all side inputs are at logic “0” as well and the gate degrades maximal (this is necessary due to the serial connection of the PMOS transistors in a NOR gate). But for a NAND gate all side inputs are forced to logic “1” to statically sensitize it. Nevertheless, the timing arc of the NAND gate that is on the path still degrades maximal because of the parallel connection of the PMOS transistors. Hence, if the path is statically sensitizable, the upper bound of a path delay interval is realistic. 5.3.4. Minimal aged path delay for a general path Like for an inverter chain, it is not possible for a general path that the gates do not degrade when both input transitions are considered simultaneously, because the aged path delay for a rising and a falling input transition compete with each other (unless the path consists just of NAND gates). To determine the minimal delay of an aged path, an optimization problem is formulated. The task is to minimize the maximum of both path delays for a rising Pr and a falling Pf input transition: minimize max(D(Pr ), D(Pf )) SP (5.8) First constraint for the optimization is that the signal probabilities are between 0 % and 100 %: s.t. 0 ≤ SP ≤ 1 (5.9) 98 5.3. Realistic aged path delays An exact approach to this minimization problem would be to use the signal probabilities at the PIs as free variables. The aged delay of the considered path depends on the signal probabilities of the on-path gate inputs and the off-path gate inputs (side inputs). The relation between the signal probabilities at the gate inputs and the signal probabilities at the PIs is given by the logic interconnection. Considering this during the optimization would lead to a complex nonlinear optimization problem with multiple local minima. To find the global minimum efficiently, the problem is simplified and a valid lower bound for the minimization problem is obtained. It is assumed that the signal probabilities at the side inputs can be chosen in a way that the aged gate delay becomes minimal without considering the logic interconnection. Only the logic interconnection of the path itself is considered, the logic interconnection of the rest of the circuit is neglected. This enables us to minimize the aged path delay further than would be possible without this simplification. Hence, one gets a valid lower bound of the minimal aged path delay. The free variables SP are now only the signal probabilities at the on-path gate inputs. Hence, the path delay dependent on SP is required. First, the gate delay degradation ∆d of an edge (i, o) for a falling input transition depends on SP i of the on-path gate input i: ∆d((i, o))f = ki · (1 − SP i )n (5.10) n is the time exponent given in the degradation equation 3.7. The factor ki combines the other dependencies (operating conditions, lifetime) which are fixed for this optimization. The path delays for a rising and a falling input transition can now be written as: D(Pr ) = D(Pr )f resh + X kl · (1 − SP l )n (5.11) kl · (1 − SP l )n (5.12) l∈Nr D(Pf ) = D(Pf )f resh + X l∈Nf SP l is the signal probability at the node l. Nr (Nf ) are sets of gate inputs along the path which have a falling input transition for a rising (falling) input at the path input. Additional constraints consider that the values for SP cannot be chosen freely, since the signal probability at a gate output depends on the signal probabilities at the gate inputs. For an inverter, the signal probability at the output SP o is given by 1 − SP i at the input. For a NOR gate with two inputs, the signal probability at the output SP o depends on both inputs SP i and SP j : SP o = (1 − SP i ) · (1 − SP j ) (5.13) Let’s assume i is the on-path input and j is the side input. SP j is not a free variable for the optimization, but it affects SP o which is again a free variable for the optimization. This can be considered by solving (5.13) for SP j : SP j = 1 − SP o 1 − SP i (5.14) 99 5. Identifying possible critical paths in aged circuits 1 V IN SPo NAND 0 NOR SPi 1 Figure 5.9.: Graphical representation of the constraints for the gate types. By taking into account that SP j is between 0 and 1, the following two relations between SP i and SP o can be obtained: SP o 1 − SP i SP o 1≥1− 1 − SP i 0≤1− (5.15) (5.16) From 5.15 the following constraint for a NOR gate can be derived: 0 ≤ SP o ≤ 1 − SP i , if (i,o) is a NOR gate (5.17) For NAND gates and inverters similar constraints arise: 1 ≥ SP o ≥ 1 − SP i SP o = 1 − SP i , if (i,o) is a NAND gate (5.18) , if (i,o) is an inverter (5.19) In Appendix A it is shown how the constraint for the NAND gate is derived and that these constraints are also valid if a NAND or NOR gate has more than two inputs. The diagram in Figure 5.9 shows the constraints for the gate types graphically. The optimization tries to choose the SP s in such a way that the gates degrade as little as possible. Hence, the SP at the gate input should be 1. The signal probability at the gate output should also be 1, because the gate output is the input of the succeeding gate. However for an inverter, having a SP of 1 at the input means that the SP at the output is 0. The inverter does not degrade but the succeeding gate degrades maximal (this increases the path delay for the opposite transition at the input). The same is true for a NOR gate. If the SP at the input is 1, then the SP at the output is 0. Only for a NAND gate it is possible to have a SP of 1 at the input and the output. The equality and inequality constraints ( 5.9, 5.17, 5.18, 5.19) are linear but the cost function ( 5.8, 5.11) is nonlinear. Unfortunately, this nonlinear optimization problem 100 5.3. Realistic aged path delays still has multiple local minima. Due to that the optimization problem was transformed into a linear optimization problem by setting the time exponent n to 1: D(Pr ) ≈ D(Pr )f resh + X kl · (1 − SPl ) (5.20) kl · (1 − SPl ) (5.21) l∈Nr D(Pf ) ≈ D(Pf )f resh + X l∈Nf To linearize the max operation in 5.8 a slack variable s is introduced: minimize max(D(Pr ), D(Pf )) = minimize s SP s,SP (5.22) s.t. s ≥ D(Pr ) s ≥ D(Pf )) Now the minimization problem can be solved efficiently. The solution of this linear problem is still a valid lower bound for the minimal path delay. This can be seen by looking once again at Figure 5.7. Shown are the exact path delays (solid lines) as well as the linearized path delays (dashed lines). The intersection of both dashed lines is the minimum of the maximum of both path delays. The minimal aged path delay degradation after linearization is 50 % of the maximal aged path degradation, compared to 85 % in the exact case. Hence, it is a valid lower bound. The degradation of the gate delay in Equation 5.10 is just dependent on the signal probability of the on-path input (switching input). This is correct for an inverter because it has just one input. For a NAND gate it is correct as well, since the delay degradation of the timing arc from the switching input to the output (almost) entirely depends on the signal probability at the switching input due to the parallel connection of the PMOS transistors. However for a NOR gate, this is not the case. The PMOS transistors are connected in series. If the PMOS transistor that is nearest to the supply voltage is connected to an input with a SP of 1, then the gate does not degrade. In the path delay equations (5.20, 5.21) this is considered by removing those NOR gates from the sets Nr and Nf where the PMOS transistor of the switching input is not directly connected to the supply voltage. Otherwise, a side input is connected to the PMOS transistor that is directly connected to the supply voltage and the signal probability of this input can be chosen freely since the interconnection of the rest of the circuit is neglected. 5.3.5. Minimal aged circuit delay The minimal aged path delay can now be used to determine a minimal aged circuit delay. However, it is not enough to determine the minimal aged path delay just for the path with the largest maximal aged path delay. Another path could have a larger minimal aged path delay. An exact method would be to determine the minimal aged path delay for all the paths with a maximal aged path delay greater than D(Pcrit )f resh . It is enough to consider paths with a maximal path delay greater than D(Pcrit )f resh , because paths with a 101 5. Identifying possible critical paths in aged circuits slower maximal aged path delay will never have a minimal aged path delay greater than D(Pcrit )f resh . The number of paths that have to be considered in the exact method might be too many. Instead, again a lower bound for the minimal aged circuit delay is determined by obtaining the minimal aged path delay of the N slowest paths and taking the maximum of them. 5.3.6. Use of minimal aged circuit delay in reduction steps The minimal circuit delay can now be used to reduce the number of identified PCPs. Criterion 1 says that a path is only a possible critical path if the aged path delay is not less than D(Pcrit )f resh . This is a necessary condition but the criterion can be refined: Criterion 1*: A path must have a maximal aged path delay Daged greater than the minimal aged circuit delay. Otherwise it is not a PCP because the path defining the minimal aged circuit delay will always have a greater path delay. The minimal aged circuit delay is then used in the slack and path delay reduction step instead of D(Pcrit )f resh . 5.3.7. Wrap-up This section was about investigating whether intervals for the gate and path delays are justified. It was shown that an interval for the aged gate delay is justified and the upper bound of the aged path delay interval is justified as well if the path is statically sensitizable. The lower bound of the aged path delay interval is equal to the fresh path delay. However, for many paths the minimal aged path delay is unequal to the fresh path delay if the maximum of the path delay for both input transitions is considered. This is because due to the inverting characteristic of CMOS logic it is not possible that the gate itself and its succeeding gate do not degrade (if the gate is an inverter or a NOR gate). An optimization problem was formulated to obtain the minimal aged path delay. The optimization problem was too complex to solve it exactly. By simplifying3 and linearizing the optimization problem, it could be efficiently solved. The solution is still a valid lower bound for the minimal aged circuit delay. From the minimal aged path delay a minimal aged circuit delay can be determined. This minimal aged circuit delay could be used to further reduce the number of PCPs by refining the Condition 1. 5.4. Considering process variations This section shows how SSTA and aging-aware timing analysis can be combined to consider the impact of process variation on PCPs. 3 SP s at the side inputs of the considered path can be chosen independently of one another 102 5.4. Considering process variations Besides aging, process variation is another limiting factor for circuit reliability. So far, process variation was not considered and deterministic values are assumed for the gate delays. However, due to process variation even the fresh gate delays cannot be determined exactly. Until recently, global process variations and uncertainties of the current operating conditions (Tcurr and Vcurr ) are considered by corner cases. Corner cases can be used as well to take global process variation and uncertainties into account when the PCPs are determined: All PCPs are determined by obtaining the PCPs for the different corner cases and computing the union of them. Due to ongoing miniaturization, local process variation within a single chip has increased so much that it can no longer be neglected. Local variations can not be considered with corner cases. For that purpose, SSTA has been developed which models timing quantities (e.g., delay, arrival time, slack) as probability distributions. Figure 5.10 illustrates the idea of how aging analysis and SSTA can be combined. In the nominal case, timing quantities are deterministic values. On the one hand, there is aging. Due to aging, those deterministic timing quantities become time dependent. To identify the PCPs, for each timing quantity an interval is considered. On the other hand, there is process variation. It results in a probability distribution for timing quantities. Combining aging and process variation results in an interval with random variables as lower and upper bounds. 5.4.1. Block-based statistical static timing analysis The approach is based on the block-based SSTA by Visweswariah et al. [2006]. This SSTA is briefly summarized. All timing quantities are represented in the canonical first-order form: n â = a0 + X ai xi + ar xr (5.23) i=1 a0 is the nominal value, xi represents the variation of n global sources and xr is a random variable modeling the pure random effect of process variation. ai and ar are the sensitivities of the timing quantities to xi and xr , respectively. ai and ar are scaled that xi and xr are Gaussian distributions with zero mean and unit variance (N (0, 1)). For a block-based timing analysis two operations are required: sum and max. The sum P ŝ of the random variables â and b̂ = b0 + ni=1 bi · xi + br · xr,b is obtained by adding the coefficients of the global variation si = ai +bi . The independent variation ar ·xr,a +br ·xr,b is replaced by sr · xr,s . sr is determined by matching the variance of ar · xr,a + br · xr,b and sr · xr,s . To compute the maximum m̂ = max(â, b̂), the tightness probability Ta is required. Ta is the probability that â is greater than b̂: Ta = P (â > b̂) = Φ( a0 − b0 ) θ Φ is the cumulative distribution function and θ = (5.24) q σa2 + σb2 − 2cov. σa2 and σb2 are the variances and cov is the covariance of â and b̂. The tightness probability Tb is (1 − Ta ). 103 P (d) 5. Identifying possible critical paths in aged circuits P (d) P (d) Agin g cess Pro ns atio vari d t dˆ daged P (d) df resh dˆf resh dˆaged Figure 5.10.: Basic idea for combining aging effects and process variations. 104 5.4. Considering process variations µ and σ 2 of m can be computed as follows: µm = Ta a0 + Tb b0 + θ · Φ( a0 − b0 ) θ (5.25) 2 σm = Ta · (σa2 + a20 ) + Tb · (σb2 + b20 ) + (a0 + b0 ) · θ · Φ( a0 − b0 ) − µm θ (5.26) Now, the maximum m̂ can be again written as a canonical form: max(a, b) = m0 + n X mi xi + mr xr,m (5.27) i=1 m0 is the mean of m̂, mi is Ta · ai + (1 − Ta ) · bi and mr is obtained by matching the 2 . variance of max(â, b̂) and the variance σm The result of the sum and the max of two canonical forms is again a canonical form. Hence, all timing quantities of the timing graph can be expressed as canonical forms. 5.4.2. Representation of timing quantities Without considering process variation, a timing quantity (q) is an interval with the fresh value q f resh as lower bound and the maximal aged value q aged as upper bound. To consider process variation, those two bounds become random variables and are represented as canonical forms: q̂ f resh = q f resh + q̂ aged = q aged + n X q i xi + q r xr i=1 n X q i xi + q r xr (5.28) (5.29) i=1 For âaged the impact of aging and the impact of process variation can be added because they are independent (see [Fischer et al., 2008]). For intervals with random variables the operations sum, max and greater than (“>”) have to be defined as well. For sum, the lower bounds and upper bounds are added similar to Equation 5.1. For max, the maximum of the lower bounds and the maximum of the upper bounds is calculated like in Equation 5.2. For intervals with random variables, the greater than operation returns a probability: P (a > b) = P (âf resh > b̂aged ) (5.30) However, a binary decision is required to decide whether a node or an edge in a timing graph can be removed. The solution is to introduce a threshold δ. If the probability is greater than δ, then the element can be removed: a > b := P (a > b) > δ = P (âf resh > b̂aged ) > δ (5.31) The combined statistical and aging-aware timing analysis is used but not limited to determining the PCPs. It can as well be used independent of that for an aging-aware 105 5. Identifying possible critical paths in aged circuits SSTA. What is not taken into account, so far, is that the transistor parameter drifts are also probability distributions. Aging effects are statistical processes. Two identical transistors under identical conditions age differently. To consider this also the nominal aged value aaged would be a random variable. 5.5. Applications After introducing the methods to identify the PCPs, here are two applications of PCPs. 5.5.1. Aging-aware timing model for modules To keep pace with the unabatedly growing complexity of integrated circuits, circuit design begins at higher and higher levels of abstraction. Furthermore, performance degradation due to aging effects can no longer be neglected [Austin et al., 2008]. Hence, timing models at higher abstraction levels are required that accurately describe the impact of aging. A module is a circuit with a well-defined function and interface (e.g. adders, multipliers, memory blocks, or even whole processors). The advantage of modules is that they are designed once and can easily be reused. The timing model describes the maximal delay of a module. A single value is enough to specify the timing of a module if aging is not considered (e.g., adder with delay = 1 ns). By considering performance degradation due to aging, the aged circuit delay depends on the operation conditions over lifetime and the workload. The aged circuit delay is defined by the critical path and there is more than one possible critical path for an aged circuit. An aging-aware timing model at higher abstraction levels enables one to: • consider the impact of aging on a system early in the design process, • determine the system performance quickly at the system level, • perform an extensive exploration of the design space. Timing models that take process variation into account have already been published [Garg and Marculescu, 2007; Li et al., 2009]. To the best of my knowledge, this is the first aging-aware timing model above gate level. Such models can, for instance, be used in high-level synthesis (HLS). One important step in HLS is scheduling. During scheduling, arithmetic/logical operands are mapped on time slots of duration T0 (see Figure 5.11). Therefore, a pre-characterized library with different implementations of modules is required. The single implementations differ in their characteristics (delay, area, power). The schedule is generated by choosing optimal implementations from the pre-characterized library [Coussy and Morawiec, 2008]. When a module ages, its delay increases. If this is not taken into account during synthesis, it is possible that the system fails before the end of its specified lifetime because the time for performing a calculation is no longer sufficient. When a module is 106 5.5. Applications + + aging + T0 + Figure 5.11.: The dotted circles indicate the aged performances. The circuit fails because the second adder needs the result before the first adder has finished its calculation. 1 7 2 S 10 6 3 8 4 9 5 7 T 11 2 10 6 S 8 4 (a) T 11 (b) Figure 5.12.: The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is a simplified TG because each net is just represented by one node. An example of a reduced TG is shown in (b). characterized for the library, it is unknown how the cell will be utilized. Therefore, a timing model is needed which provides the delay of a module dependent on operating conditions over lifetime and workload. The fundamental idea is to use a strongly reduced TG as a timing model. The maximal aged circuit delay is determined by a PCP. Hence, it is not necessary to consider the complete timing graph of a module (see Figure 5.12(a)), but it is enough to just consider the part of the TG that consists of edges that belong to PCPs (see Figure 5.12(b)). The timing model is characterized by generating the reduced timing graph that just contains edges that are part of PCPs. When an aging-aware timing analysis is performed and the aged delay for the module is needed, the reduced timing graph is evaluated. First, the workload from the module inputs has to be propagated to the nodes of the timing graph. Then, the delays of the remaining edges of the reduced timing graph can be computed by means of AgeGate. The timing model [Lorenz et al., 2010a,b] is a gray-box model, because it takes the internal structure of the module into account. It is as accurate as an aging analysis on gate level, but it is much faster. The speed-up of the timing model depends on how far 107 5. Identifying possible critical paths in aged circuits 200 Counts 150 100 50 0 0 4 2 6 8 Circuit delay degradation [%] 10 Figure 5.13.: Distributionfresh of circuit delaydelay degradation for 1000 workload samples. The dotworst-case aged circuit delay ted line is the degradation it is assumed path worst-case delay distribution (1000 MC runswhen with random workload) that all PMOS transistors degrade maximal. the timing graph could be reduced. The results show a mean speed-up of 30 ×. 5.5.2. Monitoring of aging circuits A circuit gets slower and may fail when it ages. The amount of degradation depends on the operating conditions over lifetime and on the workload. If those conditions are not (precisely) known in the design phase, then the exact degradation of the circuit cannot be determined and the circuit may fail. The histogram in Figure 5.13 shows the distribution of the delay degradation of circuit c7552 from the ISCAS’85 benchmark circuits for 1000 random workload samples. The degradation was obtained by an aging analysis on gate level. The degradation of the circuit delay ranges from just 3.0 % up to 8.6 % of the fresh circuit delay for a 90 nm technology. If the workload of a circuit is known in advance, the circuit can be optimally designed. Otherwise, if the workload is not (exactly) known, a worst-case design must be chosen to be certain that the circuit doesn’t fail during its specified lifetime. This makes the product less competitive since area, power and performance are wasted. For instance, c7552 from Figure 5.13 might be designed for a worst-case degradation of 9 % (dotted line) but the actual degradation might be just 4 %. A smarter way to deal with uncertain workload conditions is proposed in [Agarwal et al., 2007] for the first time. The actual degradation of the circuit is periodically monitored during its lifetime and the system can take countermeasures if it detects that the degradation is too large. This facilitates a better-than-worst-case design style with smaller guard bands. It makes the product more competitive since it must not be assured at design time that the circuit works correctly for rare workload conditions that it will quite likely not experience during its specified lifetime. 108 5.5. Applications Delay fault testing Existing methods to measure circuit degradation rely on single transistors [Reisinger et al., 2006], a generic test structure [Tschanz et al., 2009], or replica of critical paths [Hofmann et al., 2010]. They have the drawback of not considering the actual workload of the circuit. If, on the other hand, razor flip-flops [Das et al., 2009] are used to detect delay faults, it is not assured that the current critical path is sensitized before it has degraded too much and the circuit fails. The only reliable way to determine the degradation of the circuit is to measure the path delay of the current critical path. However, the current critical path is unknown, since it depends on the workload (e.g., c7552 had 46 different critical paths for 1000 workload samples). Therefore, the path delay of all PCPs must be determined. A delay fault test is used to test the combinational logic of a sequential circuit. It consists of two vectors V1 and V2 . An enhanced-scan test [Bushnell and Agrawal, 2000] is assumed (see Figure 5.14), which allows to apply two arbitrary vectors to the combinational logic. For this purpose, a normal-scan circuit has to be equipped with additional hold latches after the scan flip-flops. These latches hold vector V1 while V2 is read in by the scan chain. Only that way V1 and V2 can be applied directly one after another at speed (at the operating frequency of the circuit). The purpose of V1 is to sensitize the path under test (set the side inputs to non-controlling values). The transition from V1 to V2 activates the path by initiating the appropriate transition at the beginning of the path under test. One clock period later the resulting output of the combinational logic is stored into the receiving flip-flops and compared to the target value to check for a delay fault. For this application it is not enough to know whether a path still fulfills the timing specification or not. It is crucial to know in advance that a path will shortly fail. Therefore, the delay fault test must not be performed at speed, but at a slightly shorter clock period. The shorter clock period for the test depends on how often the PCPs are tested and on the desired guard band between the detection of a degraded path and an actual failure of the circuit. How the system can react A controller initiates the delay test and reacts when a path degrades too much. It can be implemented in software, since the circuit just has to be tested every several weeks at most. Hence, the proposed aging monitor requires relatively little area overhead, especially when it is already an enhanced-scan design is available. Although the focus of this chapter is to identify PCPs, here are several alternatives how a system can react when a circuit degrades too much: • The system reduces the clock frequency or increases the supply voltage to compensate for the degraded circuit delay [Mintarno et al., 2010]. • The degraded circuit can be replaced by another equivalent circuit. Due to the recovery of the threshold voltage drift caused by NBTI also the degraded circuit may be used again after some recovery time [Sylvester et al., 2006]. 109 5. Identifying possible critical paths in aged circuits D Q Hold Latch D SCAN SCAN EN SCAN EN combinational logic D Q SCAN Q Hold Latch D SCAN Q SCAN 1 SCAN EN SCAN EN 0 D Q Hold Latch D SCAN SCAN EN SCAN CHAIN 1 Q SCAN SCAN EN SCAN ENABLE CLK SCAN CHAIN 1 Figure 5.14.: Enhanced-scan design. The standard scan design is extended by hold latches. Thereby, the first delay test vector V1 is latched by the hold latches while the second delay test vector V2 is read into the scan chain. 110 5.5. Applications • Sometimes it is enough to know that a circuit may fail. In probabilistic CMOS [Chakrapani et al., 2006], a recent research area, faults in CMOS circuits are accepted. For instance, a degraded processor core can be used for less critical tasks (audio and video applications are quite fault-tolerant). • By testing all PCPs, the system does not only know that a circuit degrades but it also knows which paths of that circuit are too slow. Hence, the circuit is just too slow for several input vectors which have to be avoided. Extensions for the methods to identify PCPs When the PCPs are used for delay tests, some differences compared to the method introduced in Section 5.2 have to be considered. First, the PCPs must be sensitizable and no path must be removed from the PCPs due to another PCP that is not sensitizable. Second, unlike for timing model generation, the clock period of the circuit is known and can be used to reduce the number of PCPs further. Third, a final reduction step is introduced that enumerated the paths. Those three differences are discussed in more detail in the following three subsections. Testability of paths When nodes or edges are removed, it is crucial that all remaining PCPs are testable, otherwise it might not be possible to determine the degradation of the current critical path. This has to be considered in the arrival time and in the delay to sink reduction step. To remove an edge from the timing graph, it is checked whether a path segment A has a larger delay than a path segment B. An edge is only removed if the path segment with the greater delay is statically sensitizable. In fact, it is enough if there is a path segment which is testable and has a greater delay than path segment A to remove the edge. It does not necessarily have to be the path segment B. Considering the clock period of the circuit In contrast to the timing model generation for modules, the specified clock period is known for this application . The clock period determines the required time at T (used in slack and path delay reduction step). If the determined required time is larger than the minimal aged circuit delay obtained in Section 5.3.4, the number of PCPs can be further reduced. If, for instance, the clock period is set to 150 % of the critical path delay of the fresh circuit, there won’t be any paths that must be tested, because for the technologies and aging effects that are consider the path delay does not degrade more than 50 %. Path-based reduction step In Section 5.2, it was argued that path-enumerative methods are too inefficient to handle complex digital circuits. However, if the remaining PCPs are too many to enumerate 111 5. Identifying possible critical paths in aged circuits Figure 5.15.: Path-based reduction step them all, they are too many to test them all as well. Therefore, at least the final reduction step that works on an already reduced TG may be path-based. This final reduction step doesn’t remove any nodes or edges, it just identifies those paths in the reduced TG that have a path delay greater the fresh critical path delay. This is done by enumerating all paths with respect to the path delay in descending order. The enumeration is stopped when the first path has a maximal aged path delay less than the required time at T (see Figure 5.15). State of the art of path enumeration techniques Finding paths for the purpose of testing a circuit is a research topic for quite some time. The first approaches just considered the nominal gate delay [Li et al., 1989; Sharma and Patel, 2002]. Their goal was to identify paths to test all gates for delay faults. Later, process variation was considered [Lu et al., 2005; Zolotov et al., 2010] by identifying critical paths for all process space conditions. Two other publications are concerned with testing aged circuits. Wang et al. [2007a] introduce path-enumerative methods to identify paths which exceed the clock period under worst-case aging conditions. A optimization problem is set up to obtain these maximal aged path delays. However, a mistake was made by not considering that NBTI just degrades every other gate (those with a falling input transition). Without this mistake, the maximal aged path delay would be equal to the upper bound of the path delay intervals, as it is discussed in Section 5.3.3. Baba and Mitra [2009] proposed a method to identify the paths of an aged circuit that must be tested. Gate delay intervals are defined and methods are introduced to remove nodes and edges. Their approach is significantly improved in the following points: • The impact of process variation is considered when the PCPs are determined (Section 5.4). 112 5.6. Results • The correlation of gate delays along a path is taken into account (Section 5.3). In [Baba and Mitra, 2009] path delays are always intervals. • Baba and Mitra [2009] determine the PCPs first and in a separate step is checked whether those paths are sensitizable or not. If a path is detected not to be sensitizable, it must be checked whether the removal of other paths from the PCPs was unjustified. • An aging-aware STA has just to be performed once. In [Baba and Mitra, 2009] a STA has to be run again whenever a common edge is detected (Section 5.2.5). • The final reduction step to determine the PCPs for testing aged circuits is enumerative, which allows the removal of some additional paths from the already reduced TG. • In the results section it is shown that by calculating the number of PCPs for different lifetimes the number of paths that must be tested in the beginning of the lifetime can significantly be reduced. 5.6. Results The proposed approach is tested with ISCAS’85 and ITC’99 benchmark circuits. The circuits are synthesized with an industrial 90 nm cell library. To generate the agingaware gate models, single staged gates (inverters, NOR and NAND gates with 2 to 4 inputs) from the library are characterized. The operating conditions are 1.32 V, 125 ◦C and a specified lifetime of 10 years. Those harsh conditions result in a large maximal threshold voltage drift (17 % of nominal threshold voltage) and, therefore, lead to large intervals for the gate delays. The benchmark circuits are used for the following investigations: • The minimal aged circuit delay Daged,min is determined. • It is checked how far the timing graph can be reduced. This is relevant for the aging-aware timing model. • And the number of PCPs for the circuits is obtained. The PCPs are important for testing the circuits during the lifetime. 5.6.1. Minimal aged delay Table 5.1 shows the fresh circuit delay, the maximal aged circuit delay and the minimal aged circuit delay of the ISCAS’85 circuits. The maximal aged delay Daged is simply obtained by adding up the maximal aged gate delays. The minimal aged path delay (Daged,min ) is the result of an minimization of the 1000 slowest aged paths of the circuit, as discussed in Section 5.3.4. 113 5. Identifying possible critical paths in aged circuits c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Df resh [ns] Daged [ns] Daged,min [ns] ∆Daged,min [%] 1.16e-10 1.17e-09 1.27e-09 9.12e-10 1.43e-09 1.88e-09 9.83e-10 1.63e-09 1.8e-09 4.93e-09 1.68e-09 1.27e-10 1.28e-09 1.39e-09 9.93e-10 1.57e-09 2.06e-09 1.06e-09 1.78e-09 1.97e-09 5.37e-09 1.83e-09 1.16e-10 1.17e-09 1.29e-09 9.19e-10 1.44e-09 1.88e-09 9.83e-10 1.66e-09 1.82e-09 4.98e-09 1.69e-09 0 0 17.3 8.63 7.03 0 0 19.3 9.96 13.3 8.8 Table 5.1.: Minimal aged circuit delay The minimal delay degradation is defined as follows: ∆Daged,min = Daged,min − Df resh Daged − Df resh (5.32) A ∆Daged,min of 0 % means that Daged,min is equal to Df resh and a ∆Daged,min of 100 % would mean that Daged,min is equal to Daged . The values of Daged,min lie between 0 % and 19 %. There are several reasons why those values are below the values reached for an inverter chain (see Section 5.3.2): • The linearization of the gate delay dependencies results in a smaller minimal aged delay (50 % compared to 85 % for the inverter chain). • It depends on the gate types along a given path. As shown in Section 5.3.4, only the constraints for an inverter and a NOR gate prevent that the gates along the path do not age (If a given path just consists of NAND gates, then Daged,min would be equal to Df resh ). • Furthermore, the difference of the fresh path delay for a rising D(Pr )f resh and a falling D(Pf )f resh input transition is relevant. Minimized is the maximum of D(Pr )aged and D(Pf )aged . If the difference is too large, the maximum does not chang, since the SPs are chosen in a way that only the smaller of both path delays is increased. 5.6.2. Node and edge reduction For the aging-aware timing model, the achievable speed-up compared to a aging-aware analysis on gate level depends on how far the timing graph can be reduced. The reduction 114 5.6. Results Initial c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Reduction Speed-up Nodes Edges Nodes [%] Edges [%] Time [s] 26 526 1152 998 1262 930 1732 1912 3326 5268 4900 40 890 2010 1750 2104 1712 2994 3496 6200 9968 8348 57.7 79.8 66 86 62.2 80 95.4 71.3 93.8 26.8 90.3 70 86.3 77.4 90.8 73.7 87.4 97 80.7 96.2 43.7 93.5 0.0 0.7 1.7 0.9 2.3 1.5 2.9 2.5 6.5 17.5 9.0 3.6 × 22.1 × 17.2 × 28.7 × 6.7 × 16.9 × 55.0 × 30.6 × 43.4 × 6.7 × 96.0 × 73.6 81.5 4.1 29.7 × Ø Table 5.2.: Reduction ratios of nodes and edges (ratio) for nodes (edges) is defined as follows: reduction (ratio) = initial nodes (edges) − reduced nodes (edges) initial nodes (edges) (5.33) Table 5.2 shows the initial number of nodes and edges, the achieved reduction ratios, the time the characterization took and the speed-up compared to an aging-aware TA on gate level. On average, the number of nodes could be reduced by 74 % and the number of edges could be reduced by 82 %, which results in a speed-up of 30 ×. This can be explained because the runtime depends on the number of nodes as well as on the number of edges4 . 5.6.3. Possible critical paths In Table 5.3 results are given for the proposed approach and, as a comparison, for the approach described by Baba and Mitra [2009]. Just those benchmark circuits with more than 500 gates are shown. In the state-of-the-art approach all reduction steps are performed except the path enumeration step and the minimal aged delay is not computed for the slack and path delay reduction step. For the determination of PCPs, it is assumed that the specified clock period of the circuits is equal to the fresh circuit delay Dcrit,f resh . This is a worst-case assumption, since it results in the largest possible number of PCPs. In a real product, there would be a safety margin between Dcrit,f resh and the specified clock period. Without this safety 4 It is assumed that the runtime depends linearly on the number of edges and nodes of the timing graph: Roughly 1/5 of the nodes and 1/5 of the edges remains. Hence, the resulting run time is 1/5 · 1/5 = 1/25 of the initial runtime or the speed-up is 25 ×. 115 5. Identifying possible critical paths in aged circuits margin it is almost inevitable that the system has to react because the circuit degrades to much. As Baba and Mitra [2009] do not take process variations (PVs) into account,the results are first compared without considering PVs. The proposed approach can reduce the number of PCPs compared to Baba and Mitra [2009] by a factor of 2.7 × (column Impr. in Table 5.3). For all circuits, except for c6288 and b19, the number of PCPs is reasonably small, so that it is feasible to test them all. For circuit c6288 and b19, a traditional worst-case design must be used. For the other circuits a better-than-worstcase design can be used by testing all identified PCPs periodically. It seems that the number of identified PCPs is more dependent on the TG structure than on the pure circuit size: Circuit c6288 with just 2600 gates has over 1012 PCPs, however, b18 with over 80 000 gates just has 236 PCPs. The runtime to determine all PCPs of a circuit with considering PVs is 30 min on average on a workstation with a 2.4 GHz CPU and 8 GB RAM. Without circuit b19, which took about 7 h, the average runtime of the remaining circuits is 10 min. Finally, the last two columns show the results when all reduction steps are performed and local PVs are considered. The δ, which defines when a timing quantity is considered greater than another quantity, is set to 0.9. Hence, a timing quantity is considered greater than another one when the probability for this is greater 90 %. For the moment just one source of variation is considered, namely the pure random variation of the threshold voltage xr . xr is set to 10 % of the nominal Vth . However, as described in Visweswariah et al. [2006] an arbitrary number of varying parameters can be considered. Due to the uncertainty of the gate delays introduced by PVs, the number of PCPs that have to be tested is increased. More detailed results for all benchmark circuits are given in the Appendix B. There, it is shown how far the number of PCPs can be reduced by the individual reduction steps. The test time can be further reduced by determining sets of PCPs for different time periods (see Table 5.4). The PCPs for circuit c3540 are calculated every 2 years until the specified lifetime is reached. Hence, for the first 2 years just 175 paths have to be checked. The number of paths to be checked increases with time and in the last two years all 1318 have to be checked. 5.7. Summary Aging is one of the main factors limiting the reliability of nano-scale circuits. The degradation of a circuit strongly depends on operating conditions and workload. If those conditions are not (yet) known, it is hard to accurately predict the degraded timing behavior of a circuit, which is given by the delay of the critical path. A method is proposed to identify all paths of a circuit that may become critical due to degradation, the so called possible critical paths. First, this is done by introducing intervals for gate delays, since the exact delays are unknown. Later, it is shown that those intervals for the path delay are sometimes too pessimistic and an efficient method to calculate a lower bound for the path delay, called 116 5.7. Summary Initial Mitra Proposed approach process variations (PVs) are neglected c499 c1355 c2670 c3540 c5315 c6288a c7552 b04 b05 b07 b11 b12 b14 b15 b17 b18a b19a b20a b21 b22 with PVs # Gates # Paths # PCPs # PCPs Impr. runtime [s] # PCPs runtime [s] 534 589 708 905 1484 2601 2242 540 1156 615 1050 1497 5718 10236 24840 83679 144747 13097 13052 19731 452608 522368 31286 4248254 738816 5.1 · 1016 448564 185324 189666 4046 7156 20020 2.0 · 108 2.0 · 107 6.1 · 107 2.6 · 1026 4.7 · 1022 6.7 · 1012 7.2 · 1012 6.7 · 1012 1487 3376 21 15276 1568 6.8 · 1012 3173 120 3295 10 21 237 52170 42089 275 236 1.0 · 1019 6138 3796 2436 375 2224 21 1345 899 4.1 · 1012 522 120 981 10 16 144 10948 6387 173 236 1.0 · 1019 5283 3452 1489 4.0 1.5 1.0 11.4 1.7 1.6 6.1 1.0 3.4 1.0 1.3 1.6 4.8 6.6 1.6 1.0 1.0 1.2 1.1 1.6 24.79 41.39 12.43 57.13 33.09 559.96 31.02 17.08 20.57 7.03 10.32 13.18 880.73 176.25 305.71 1589.8 3737.23 452.09 470.35 397.52 696 3074 46 2608 2012 9.7 · 1012 942 378 1672 24 25 247 49004 14116 238 608 2.4 · 1019 9998 5755 2828 65.25 117.91 23.42 173.93 94.4 661.31 79.07 39.64 61.59 14.5 21.77 32.79 1340.81 649.8 782.09 4642.54 25182.16 890.82 771.66 871.6 2.7 7.4 min Average 30 min Table 5.3.: Comparison of the proposed approach to the approach from Baba and Mitra [2009]. Shown are the initial number of gates and paths in the circuit, the number of PCPs, the improvement in the number of PCPs of our approach compared to Baba and Mitra [2009] and the runtimes with and without considering process variations. a Number of PCPs for those circuits is determined without checking statical sensitizability because the BDDs for the circuits were too large and could not be set up. However, statical sensitizability checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008]). Time [y] # PCPs 2 175 4 396 6 773 8 1114 10 1318 Table 5.4.: Number of PCPs over time of circuit c3540 117 5. Identifying possible critical paths in aged circuits minimal aged path delay, is presented. A way to incorporate process variation when the PCPs are determined is introduced as well. Two applications for PCPs are given: An aging-aware timing model for modules and the usage of PCPs to monitor a circuit in the field. The results show that the timing model has a mean speed-up of 30 × compared to a timing analysis on gate level and the number of paths that must be tested can be reduced by 2.7 × compared to a state-ofthe-art approach. 118 6. Conclusion Aging leads to a time-dependent change of device parameters. Unlike other effects that cause a variation of ICs, aging effects have not received much attention yet. However due to the ongoing miniaturization, the degradation of the circuit performance caused by aging effects increases. Furthermore, the performance gain by moving from one technology to the next decreases. Hence, generous safety margins are no longer affordable, since this makes the transition to the latest technology generations uneconomical. To enable the continued scaling, new design techniques are required that allow the reduction of the safety margins. The contribution of this thesis are very accurate analyzing and monitoring methods to determine the timing degradation of aged circuits. First, the two dominant drift-related aging effects were investigated. It was shown how the parameter drift can be modeled and which impact those drifts have on the gate performances. It turned out that the gate delay as well as the output slope is increased. However, the power dissipation of a gate is not affected or even slightly reduced. An aging analysis flow on gate level capable of determining the impact of the two dominant drift-related aging effects on circuit timing was developed and implemented. The centerpiece of the analysis flow is an aging-aware gate model called AgeGate. AgeGate consists of a canonical gate model, technology specific degradation equations, and information about the internal gate structure. In contrast to existing aging-aware gate models, AgeGate takes NBTI and HCI into account, it does not just compute an aged gate delay but an aged output slope as well and, last but not least, it considers individual transistor drifts. The results show that both aging effects are relevant, not calculating an aged output slope underestimates the performance degradation by 24 % on average, and not computing individual transistor drifts overestimates the degradation by 20 % on average. The continued scaling requires that the design is done on higher and higher levels of abstraction. Based on AgeGate, an aging-aware timing model for modules was proposed. The basic idea of the timing model was to determine all possible critical paths of a module that might become critical due to aging. This is done by removing all elements of a timing graph that do not belong to a possible critical path. This way, the timing model is as accurate as an aging-aware timing analysis on gate level but a mean speed-up of 30 × (maximum speed-up 96 ×) could be achieved. Aging is an increasing reliability concern in advanced technologies. The timing degradation of a circuit strongly depends on the workload and the operating conditions over lifetime. However, often these factors are unknown during the design of a circuit. A method that monitors the circuit by testing the delay of all possible critical paths was introduced. This way, countermeasures must only be taken if the circuit degrades too much. The circuit is more competitive, since it must not be designed for worst-case con- 119 6. Conclusion ditions. Compared to a state-of-the-art approach the number of possible critical paths could be reduced by a factor of 2.7. Furthermore, process variation can be considered for the identification of possible critical paths. 120 A. Constraints for NAND and NOR gates First, the constraint for a NAND gate with two inputs is derived. The SP o at the output of the NAND gate is: SP o = 1 − SP i · SP j (A.1) Solving Equation A.1 for the side input SP j : SP j = 1 − SP o SP i (A.2) Taking into account that SP j is between 0 and 1 gives the following constraint: 0 ≤ SP j ≤ 1 1 − SP o 0≤ ≤1 SP i SP o ≥ 1 − SP i (A.3) (A.4) (A.5) Next the constraint for a NAND gate with three inputs is derived: SP o = 1 − SP i · SP j · SP k 1 − SP o SP k = SP i · SP j (A.6) (A.7) SP k must also be between 0 and 1: 0 ≤ SP k ≤ 1 1 − SP o 0≤ ≤1 SP i · SP j SP o ≥ 1 − SP i · SP j (A.8) (A.9) (A.10) By considering that SP j is between 0 and 1, the lower bound of the inequality for SP o of a three input NAND gate is equal to the constraint for a NAND gate with two inputs: SP o ≥ 1 − SP i (A.11) The constraint for a NAND gate with n inputs is equivalent. Finally, it is shown that the constraint for a NOR gate with 3 (or n) inputs is equivalent to the constraint for a two input NOR gate. SP o = (1 − SP i ) · (1 − SP j ) · (1 − SP k ) (A.12) 121 A. Constraints for NAND and NOR gates SP k = 1 − SP o (1 − SP i ) · (1 − SP j ) (A.13) 0 ≤ SP k ≤ 1 (A.14) SP o ≤1 (1 − SP i ) · (1 − SP j ) SP o 0≤ ≤1 (1 − SP i ) · (1 − SP j ) (A.15) 0≤1− SP o ≤ (1 − SP i ) · (1 − SP j ) (A.16) (A.17) By considering that SP j is between 0 and 1, the upper bound of the inequality for SP o of a three (or a n) input NOR gate is equal to the constraint for a two input NOR gate: SP o ≤ (1 − SP i ) 122 (A.18) B. More detailed results for PCP identification Table B.1 shows the number of PCPs and the corresponding runtimes for all ISCAS’95 and ITC’99 circuits. The reduction steps, as discussed in Section 5.2, are applied to the initial TG one after another. First, the slack reduction step is performed. Next, the path delay reduction step is applied to the already reduced TG. Then, the arrival time and the delay to sink reduction steps are performed. The column “All reduction steps considering minimum aged circuit delay” shows the resulting number of PCPs when Dcrit,f resh is replaced by Dcrit,aged,min (which is relevant for the slack, the path delay and the pathbased reduction steps) and all reduction steps are performed again. Finally, the last column shows the number of PCPs and the runtime when the same reduction steps as in the previous column are performed but process variations are considered as well. 123 B. More detailed results for PCP identification c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288a c7552 b02 b03 b04 b05 b06 b07 b08 b09 b10 b11 b12 b13 b14 b15 b17 b18a b19a b20a b21 b22 ([s]) 3 157 1487 98 3376 4596 21 15276 1568 6.8e+12 3173 3 80 120 3295 10 10 9 83 45 21 237 2 52170 42089 275 236 1e+19 6138 3796 2436 # PCPs (0.00) (0.05) (1.24) (0.05) (2.55) (16.49) (0.02) (0.45) (0.06) (6.99) (0.12) (0.00) (0.03) (0.03) (0.17) (0.01) (0.01) (0.01) (0.04) (0.02) (0.02) (0.09) (0.00) (32.75) (2.29) (0.21) (0.14) (67.52) (0.99) (1.05) (0.69) ([s]) 3 157 375 74 2224 2091 21 1345 899 4.1e+12 522 3 80 120 981 7 10 9 59 42 16 144 2 10948 6387 173 236 1e+19 5283 3452 1489 # PCPs (0.00) (0.08) (2.53) (0.08) (4.90) (23.44) (0.04) (0.45) (0.11) (10.92) (0.26) (0.00) (0.17) (0.05) (0.53) (0.02) (0.03) (0.02) (0.08) (0.07) (0.05) (0.35) (0.01) (46.50) (10.95) (0.63) (0.23) (228.69) (1.09) (1.85) (1.03) ([s]) 5 167 696 118 3074 5407 46 2608 2012 9.7e+12 942 3 96 378 1672 16 24 6 40 55 25 247 5 49004 14116 238 608 2.4e+19 9998 5755 2828 # PCPs (0.19) (24.62) (65.25) (18.92) (117.91) (394.14) (23.42) (173.93) (94.40) (661.31) (79.07) (0.55) (9.97) (39.64) (61.59) (2.08) (14.50) (11.08) (7.78) (9.33) (21.77) (32.79) (7.89) (1340.81) (649.80) (782.09) (4642.54) (25182.16) (890.82) (771.66) (871.60) ([s]) Initial # PCPs (0.00) (0.01) (0.05) (0.01) (0.05) (0.02) (0.01) (0.06) (0.02) (0.33) (0.03) (0.00) (0.01) (0.01) (0.04) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.02) (0.00) (0.14) (0.38) (0.04) (0.06) (2.48) (0.13) (0.20) (0.12) Arrival time and delay to sink reduction steps ([s]) 6 13126 29518 1007 40432 130736 1068 357372 75368 2.3e+16 9480 3 131 15336 12392 18 35 91 200 143 87 523 3 5.2e+06 894146 7608 2.4e+07 1.8e+22 2.5e+06 7.3e+11 385166 Path delay reduction step # PCPs (0.00) (0.17) (0.99) (0.58) (0.74) (0.86) (1.98) (0.80) (4.44) (0.92) (6.08) (0.02) (0.20) (1.45) (1.48) (0.03) (1.10) (0.13) (0.18) (0.17) (2.17) (2.66) (0.45) (24.44) (28.39) (212.04) (1001.56) (2133.45) (82.07) (61.67) (114.18) Slack reduction step # Paths 6 22378 69517 1650 72216 138412 2402 923326 82572 3.5e+16 18310 4 164 17172 19653 23 37 176 270 183 106 708 3 6.1e+06 1.7e+06 10898 2.6e+07 2.2e+22 2.7e+06 7.6e+11 519916 All reduction steps considering minimum aged circuit delay and process variations # Gates 18 123652 452608 16956 522368 1.5e+06 31286 4.2e+06 738816 5.1e+16 448564 72 1632 185324 189666 238 4046 2632 1858 1790 7156 20020 1216 2e+08 2e+07 6.1e+07 2.6e+26 4.7e+22 6.7e+12 7.2e+12 6.7e+12 All reduction steps considering minimum aged circuit delay 7 226 534 438 589 431 708 905 1484 2601 2242 28 233 540 1156 74 615 155 167 213 1050 1497 307 5718 10236 24840 83679 144747 13097 13052 19731 Table B.1.: Detailed results for the proposed reduction steps a Number of PCPs for those circuits is determined without checking statical sensitizability because the BDDs for the circuits were too large and could not be set up. However, statical sensitizability checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008].) 124 Bibliography M. Agarwal, Bipul C. Paul, Ming Zhang, and Subhasish Mitra. Circuit Failure Prediction and Its Application to Transistor Aging. In IEEE VLSI Test Symposium, pages 277– 286, May 2007. M. A. Alam, H. Kufluoglu, D. Varghese, and S. Mahapatra. A comprehensive model for PMOS NBTI degradation: Recent progress. Microelectronics Reliability, 47(6):853 – 862, June 2007. Charles J. Alpert, Anirudh Devgan, and Stephen T. Quay. Buffer Insertion for Noise and Delay Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(11):1633–, November 1999. Todd Austin, Valeria Bertacco, Scott Mahlke, and Yu Cao. Reliable Sytems on Unreliable Fabrics. IEEE Design and Test, 2008. A. H. Baba and Subhasish Mitra. Testing for transistor aging. In IEEE VLSI Test Symposium, pages 215 – 220, May 2009. Thomas Baumann, Stefan Drapatz, Georg Georgakos, Karl Hofmann, and Christian Pacha. Accelerating and Masking Properties of Transistor Degradation of Selected Digital Circuit Topologies. Honey milestone report 3.1.2-q11, Infineon Technologies, August 2010. Manuel J. Bellido, Jorge Juan, and Manuel Valencia. Logic-Timing Simulation and the Degradation Model. Imperial College Press, London, 2006. D.R. Bild, G.E. Bok, and R. P. Dick. Minimization of NBTI performance degradation using internal node control. In Design, Automation and Test in Europe (DATE), pages 148–153, 2009. David Blaauw, Kaviraj Chopra, Ashish Srivastava, and Louis Scheffer. Statistical Timing Analysis: From Basic Principles to State of the Art. IEEE Trans. on CAD of Integrated Circuits and Systems, 4:589–607, 2008. David T. Blaauw, Chanhee Oh, Vladimir Zolotov, and Aurobindo Dasgupta. Static electromigration analysis for on-chip signal interconnects. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(1):39–48, January 2003. Michael L. Bushnell and Vishwani D. Agrawal. Essentials of Electronic Testing. Kluwer Academic Publisher, 2000. 125 Bibliography Cadence. Reliability simulation in integrated circuit design. Technical report, Cadence Design Systems, Inc., 2003. Cadence. ECSM - Effective Current Source Model. Alliances/languages/Pages/ecsm.aspx, 2007. http://www.cadence.com/ Lakshmi N. B. Chakrapani, Bilge E. S. Akgul, Suresh Cheemalavagu, Pinar Korkmaz, Krishna V. Palem, and Balasubramanian Seshasayee. Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology. In Design, Automation and Test in Europe (DATE), pages 1110–1115, 2006. A. P. Chandrakasan and R. W. Brodersen. Minimizing power consumption in digital CMOS circuits. Proceedings of the IEEE, 83(4):498 – 523, April 1995. Jifeng Chen, Shuo Wang, Nemat Bidokhti, and Mohammad Tehranipoor. A Framework for Fast an Accurate Critical-Reliability Paths Identification. In IEEE North Atlantic Test Workshop (NATW), May 2011. Liang-Chi Chen, Sandeep K. Gupta, and Melvin A. Breuer. A new gate delay model for simultaneous switching and its applications. In ACM/IEEE Design Automation Conference (DAC), pages 289–294, New York, NY, USA, 2001. ACM. Mihir Choudhury, Vikas Chandra, Kartik Mohanram, and Robert C. Aitken. Analytical model for TDDB-based performance degradation in combinational logic. In Design, Automation and Test in Europe (DATE), pages 423 – 428, 2010. M. A. Cirit. Estimating dynamic power consumption of CMOS circuits. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1987. Philippe Coussy and Adam Morawiec. High-Level Synthesis from Algorithms to Digital Circuits. Springer, 2008. John Croix and Martin Wong. Blade and razor: cell and interconnect delay analysis using current-based models. In ACM/IEEE Design Automation Conference (DAC), pages 386–389, June 2003. S. Das, C. Tokunaga, S. Pant, W. H. Ma, S. Kalaiselvan, K. Lai, D.M. Bull, and David T. Blaauw. RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance. IEEE Journal of Solid-State Circuits, 44(1):32–48, January 2009. Rolf Drechsler, Stephan Eggersglueß, Gooerschwin Fey, Andreas Glowatz, Friedrich Hapke, Juergen Schloeffel, and Daniel Tille. On Acceleration of SAT-Based ATPG for Industrial Designs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(7):1329–1333, July 2008. Robert Entner. Modeling and Simulation of Negative Bias Temperature Instability. PhD thesis, Technische Universität Wien, 2007. 126 Bibliography Thomas Fischer, E. Amirante, Karl Hofmann, M. Ostermayr, Peter Huber, and Doris Schmitt-Landsiedel. A 65nm test structure for the analysis of NBTI induced statistical variation in SRAM transistors. In European Solid-State Device Research Conference (ESSDERC), pages 51–54, September 2008. S. Garg and Diana Marculescu. System-Level Process Variation Driven Throughput Analysis for Single and Multiple Voltage-Frequency Island Designs. In Design, Automation and Test in Europe (DATE), pages 1–6, April 2007. Tibor Grasser, B. Kaczer, W. Goes, T. Aichinger, P. Hehenberger, and M. Nelhiebel. A two-stage model for negative bias temperature instability. In IEEE International Reliability Physics Symposium (IRPS), pages 33 – 44, April 2009. Stephan Henzler, Martin Wirnshofer, and Dominik Lorenz. Intrinsic time margin monitoring for assessment of process variation and aging, 2009. S. Herbert and D. Marculescu. Mitigating the impact of variability on chipmultiprocessor power and performance. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(10):1520 –1533, oct. 2009. ISSN 1063-8210. doi: 10.1109/TVLSI.2009.2020394. Karl Hofmann, Hans Reisinger, K. Ermisch, C. Schlunder, Wolfgang Gustin, T. Pompl, Georg Georgakos, K.v. Arnim, J. Hatsch, T. Kodytek, Thomas Baumann, and Christian Pacha. Highly accurate product-level aging monitoring in 40nm CMOS. In Symposium on VLSI Technology (VLSIT), pages 27–28, June 2010. Vincent Huard, CR Parthasarathy, Alain Bravaix, Chloe Guerin, and Emmanuel Pion. CMOS device design-in reliability approach in advanced nodes. In IEEE International Reliability Physics Symposium (IRPS), pages 624–633, 2009. A. E. Islam, H. Kufluoglu, D. Varghese, S. Mahapatra, and M. A. Alam. Recent Issues in Negative-Bias Temperature Instability: Initial Degradation, Field Dependence of Interface Trap Generation, Hole Trapping Effects, and Relaxation. IEEE Transactions on Electron Devices (TED), pages 2143 – 2154, September 2007. ITRS. The International Technology Roadmap for Semiconductors: Process Integration, Devices & Structures (PIDS) . http://www.itrs.net/Links/2001ITRS/PIDS.pdf, 2001. ITRS. The International Technology Roadmap for Semiconductors: Process Integration, Devices & Structures (PIDS). http://www.itrs.net/Links/2009ITRS/ 2009Chapters_2009Tables/2009Tables_FOCUS_C_ITRS.xls, 2009. Yun-Cheng Ju and Resve A. Saleh. Incremental techniques for the identification of statically sensitizable critical paths. In ACM/IEEE Design Automation Conference (DAC), pages 541–546, New York, NY, USA, 1991. ACM. 127 Bibliography Kunhyuk Kang, Sang Phill Park, Kaushik Roy, and Muhammad A. Alam. Estimation of statistical variation in temporal NBTI degradation and its impact on lifetime circuit performance. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 730–734, Piscataway, NJ, USA, 2007. IEEE Press. Margot Karam, W. Fikry, H. Haddara, and H. Ragai. Implementation of hot-carrier reliability simulation in ELDO. In IEEE International Symposium on Circuits and Systems (ISCAS), volume 5, pages 515–518, 2001. Christoph Knoth, Irina Eichwald, Petra Nordholz, and Ulf Schlichtmann. White-Box Current Source Modeling Including Parameter Variation and Its Application in Timing Simulation. In International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pages 200–210, September 2010. Christoph Knoth, Carsten Uphoff, Sebastian Kiesel, and Ulf Schlichtmann. SWAT: Simulator for Waveform-Accurate Timing including Parameter Variations and Transistor Aging. In International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), September 2011. to appear. Haldun Kufluoglu, V. Reddy, A. Marshall, J. Krick, T. Ragheb, C. Cirba, A. Krishnan, and C. Chancellor. An Extensive and Improved Circuit Simulation Methodology For NBTI Recovery. In IEEE International Reliability Physics Symposium (IRPS), pages 670–675, 2010. Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. An Analytical Model for Negative Bias Temperature Instability. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 493–496, 2006. Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. NBTI-aware synthesis of digital circuits. In ACM/IEEE Design Automation Conference (DAC), pages 370–375, New York, NY, USA, 2007a. ACM. Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. NBTI-Aware Synthesis of Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 370– 375, 2007b. Yung-Huei Lee, Neal Mielke, Marty Agostinelli, Sukirti Gupta, Ryan Lu, and William McMahon. Prediction of Logic Product Failure Due To Thin-Gate Oxide Breakdown. In IEEE International Reliability Physics Symposium (IRPS), pages 18 – 28, 2006. Bing Li, Ning Chen, Manuel Schmidt, Walter Schneider, and Ulf Schlichtmann. On Hierarchical Statistical Static Timing Analysis. In Design, Automation and Test in Europe (DATE), April 2009. Wing Ning Li, Sudhakar M. Reddy, and Sartaj K. Sahni. On Path Selection in Combinational Logic Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(1):56–63, January 1989. 128 Bibliography Zhihong Liu, Bruce W. McGaughy, and James Z. Ma. Design tools for reliability analysis. In ACM/IEEE Design Automation Conference (DAC), pages 182–187, 2006. Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging Analysis of Circuit Timing Considering NBTI and HCI. In IEEE International On-Line Testing Symposium (IOLTS), pages 3–8, June 2009a. Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Alterungsanalyse digitaler Schaltungen auf Gatterebene. In GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf, pages 81–86. VDE Verlag GMBH, September 2009b. Dominik Lorenz, Martin Barke, Daniel Mueller-Gritschneder, Georg Georgakos, and Ulf Schlichtmann. Aging model for timing analysis at register-transfer-level. In ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, March 2010a. Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Aging analysis at gate and macro cell level. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 77–84, November 2010b. Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Timing-Modell für Makrozellen zur Alterungsanalyse. In GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf, pages 41–47, September 2010c. Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging-aware Timing Analysis of Combinatorial Circuits on Gate Level. it - Information Technology, 4, August 2010d. Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Efficiently analyzing the impact of aging effects on large integrated circuits. Microelectronics Reliability, (0):–, 2012. ISSN 0026-2714. doi: 10.1016/j.microrel.2011.12.029. URL http://www.sciencedirect. com/science/article/pii/S0026271411005622. Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, and Weiping Shi. Longest-path selection for delay test under process variation. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, pages 1924 – 1929, December 2005. Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statistical reliability analysis under process variation and aging effects. In ACM/IEEE Design Automation Conference (DAC), pages 514–519, July 2009. Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. A Novel Gate-Level NBTI Delay Degradation Model with Stacking Effect. In Nadine Azemard and Lars Svensson, editors, Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, volume 4644 of Lecture Notes in Computer Science, pages 160–170. Springer Berlin / Heidelberg, 2007a. Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. Modeling of PMOS NBTI Effect Considering Temperature Variation. In IEEE International 129 Bibliography Symposium on Quality Electronic Design (ISQED), pages 139–144, Washington, DC, USA, 2007b. IEEE Computer Society. R. E. Lyons and W. Vanderkulk. The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM Journal of Research and Development, 6(2):200–209, April 1962. Elie Maricau and Georges Gielen. Efficient Variability-Aware NBTI and Hot Carrier Circuit Reliability Analysis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29(12):1884–1893, December 2010. J. Greg Massey. NBTI: What We Know and What We Need to Know. In IEEE International Integrated Reliability Workshop Final Report, pages 199–211, 2004. Tobias Massier, Helmut Graeb, and Ulf Schlichtmann. The Sizing Rules Method for CMOS and Bipolar Analog Integrated Circuit Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(12):2209–2222, December 2008. Evelyn Mintarno, Joelle Skaf, Rui Zheng, Jyothi Velamala, Yu Cao, Stephen P. Boyd, Robert W. Dutton, and Subhasish Mitra. Optimized self-tuning for circuit aging. In Design, Automation and Test in Europe (DATE), pages 586–591, 2010. Natasa Miskov-Zivanov and Diana Marculescu. Modeling and Optimization for SoftError Reliability of Sequential Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(5):803–816, May 2008. Yoshio Miura and Yasuo Matukura. Investigation of Silicon-Silicon Dioxide Interface Using MOS Structure. Japanese Journal of Applied Physics, page 180, 1966. Gordon E. Moore. Cramming More Components onto Integrated Circuits. International Journal of High Speed Electronics and Systems, 38(8), April 1965. Farid N. Najm. Transition Density: A Stochastic Measure of Activity in Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 644 – 649, June 1991. Farid N. Najm. Transition Density: A New Measure of Activity in Digital Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 12 (2):310 – 323, February 1993. Farid N. Najm. A survey of power estimation techniques in VLSI circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4):446–455, December 1994. Sani R. Nassif. Design for variability in DSM technologies. In IEEE International Symposium on Quality Electronic Design, March 2000. Bipul C. Paul, Kunhyuk Kang, Haldun Kufluoglu, M.A. Alam, and K. Roy. Temporal Performance Degradation under NBTI: Estimation and Design for Improved Reliability of Nanoscale Circuits. In Design, Automation and Test in Europe (DATE), volume 1, pages 169–174, Los Alamitos, CA, USA, 2006. IEEE Computer Society. 130 Bibliography Christian Piguet. Low-power electronics design. CRC Press, 2005. Lawrence T. Pillage, Ronald A. Rohrer, and Chandramouli Visweswariah. Electronic Circuit and System Simulation Methods. McGraw-Hill, Inc., 1995. Jessica Qian, Satyamurthy Pullela, and Lawrence T. Pillage. Modeling the “Effective capacitance" for the RC interconnect of CMOS gates. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 13(12), December 1994. Stewart E. Rauch III. The statistics of NBTI-induced VT and beta mismatch shifts in pMOSFETs. IEEE Transactions on Device and Materials Reliability, pages 89 – 93, December 2002. Hans Reisinger, O. Blank, Wolfgang Heinrigs, A. Muhlhoff, Wolfgang Gustin, and Christian Schlünder. Analysis of NBTI Degradation- and Recovery-Behavior Based on Ultra Fast VT-Measurements. In IEEE International Reliability Physics Symposium (IRPS), pages 448–453, March 2006. Hans Reisinger, O. Blank, Wolfgang Heinrigs, Wolfgang Gustin, and Christian Schlünder. A Comparison of Very Fast to Very Slow Components in Degradation and Recovery Due to NBTI and Bulk Hole Trapping to Existing Physical Models. IEEE Transactions on Device and Materials Reliability, 7(1):119–129, 2007. Renesas. Semiconductor Reliability Handbook. Renesas Electronics, 2008. T. Sakurai and A. R. Newton. Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas. IEEE Journal of Solid-State Circuits SC, 25(2):584–594, April 1990. Sachin S. Sapatnekar. Timing. Kluwer Academic Publishers, 2004. Louis Scheffer, Luciano Lavangno, and Grant Martin, editors. EDA for IC implementation, circuit design, and process technology. Electronic Design Automation for Integrated Circuits Handbook. CRC Press, Boca Raton, 2006. Christian Schlünder, J. M. Berthold, M. Hoffmann, J.-M. Weigmann, Wolfgang Gustin, and Hans Reisinger. A New Smart Device Array Structure for Statistical Investigations of BTI Degradation and Recovery. In IEEE International Reliability Physics Symposium (IRPS), May 2011. Dieter K. Schroder and Jeff A. Babcock. Negative bias temperature instability: Road to cross in deep submicron silicon semiconductor manufacturing. Journal of Applied Physics, 94(1), 2003. G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, Sandhya Dwarkadas, and Michael A. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In Symposium on High-Performance Computer Architecture, pages 29–40, February 2002. 131 Bibliography Ellen M. Sentovich, Kanwar Jit Singh, Luciano Lavagno, Cho Moon, Rajeev Murgai, Alexander Saldanha, Hamid Savoj, Paul R. Stephan, Robert K. Brayton, and Alberto L. Sangiovanni-Vincentelli. SIS: A System for Sequential Circuit Synthesis. Memorandum UCB/ERL M92/41, Electronics Research Laboratory, University of California, Berkeley, CA 94720, May 1992. M. Sharma and J. H. Patel. Finding a small set of longest testable paths that cover every gate. In IEEE International Test Conference (ITC), pages 974 – 982, December 2002. Alexander Stempkovsky, Alexey Glebov, and Sergey Gavrilov. Calculation of Stress Probability for NBTI-Aware Timing Analysis. In IEEE International Symposium on Quality Electronic Design (ISQED), pages 714–718, March 2009. Alvin W. Strong, Ernest Y. Wu, Rolf-Peter Vollertsen, Jordi Sune, Giuseppe La Rosa, Stewart E. Rauch III, and Timothy D. Sullivan. Reliability Wearout Mechanisms in Advanced CMOS Technologies. Series on Microelectronic Systems. IEEE Press, 2009. Dennis Sylvester, David Blaauw, and Eric Karl. Elastic: An adaptive self-healing architecture for unpredictable silicon. IEEE Design & Test of Computers, 23(6):484–490, 2006. Synopsys. Composite Current Source. http://www.synopsys.com/products/ solutions/galaxy/ccs/cc_source.html, 2006. Synopsys. HSPICE User Guide: Simulation and Analysis, September 2008. E. Talpes and D. Marculescu. Toward a multiple clock/voltage island design style for power-aware processors. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 13(5):591 –603, may 2005. ISSN 1063-8210. doi: 10.1109/TVLSI.2005. 844305. James Tschanz, Keith A. Bowman, Steve Walstra, Marty Agostinelli, Tanay Karnik, and Vivek De. Tunable replica circuits and adaptive voltage-frequency techniques for dynamic voltage, temperature, and aging variation tolerance. In Symposium on VLSI Circuits, pages 112–113, June 2009. Robert H. Tu, Elyse Rosenbaum, Wilson Y. Chan, chester C. Li, Eric Minami, Khandker Quader, Ping Keuung Ko, and Chenming Hu. Berkeley Reliability Tools - BERT. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 12:1524–1533, 1993. John P. Uyemura. CMOS logic circuit design. Kluwer Academic Publisher, 2001. Chandu Visweswariah, Kaushik Ravindran, Kerim Kalafala, Steven G. Walker, Sambasivan Narayan, Daniel K. Beece, Jeff Piaget, Natesan Venkateswaran, and Jeffrey G. Hemmet. First-Order Incremental Block-Based Statistical Timing Analysis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(10), October 2006. 132 Bibliography Wenping Wang, Zile Wei, Shengqi Yang, and Yu Cao. An efficient method to identify critical gates under circuit aging. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 735–740, Piscataway, NJ, USA, 2007a. IEEE Press. Wenping Wang, Shengqi Yang, Sarvesh Bhardwaj, Rakesh Vattikonda, Sarma Vrudhula, Frank Liu, and Y. Cao. The impact of NBTI on the performance of combinational and sequential circuits. In ACM/IEEE Design Automation Conference (DAC), pages 364–369, New York, NY, USA, 2007b. ACM. Wenping Wang, Shengqi Yang, and Yu Cao. Node Criticality Computation for Circuit Timing Analysis and Optimization under NBTI Effect. In IEEE International Symposium on Quality Electronic Design (ISQED), pages 763–768, March 2008. Yu Wang, Xiaoming Chen, Wenping Wang, Varsha Balakrishnan, Yu Cao, Yuan Xie, and Huazhong Yang. On the efficacy of input Vector Control to mitigate NBTI effects and leakage power. In IEEE International Symposium on Quality Electronic Design (ISQED), pages 19–26, March 2009a. Yu Wang, Xiaoming Chen, Wenping Wang, Yu Cao, Yuan Xie, and Huazhong Yang. Gate replacement techniques for simultaneous leakage and aging optimization. In Design, Automation and Test in Europe (DATE), pages 328–333, April 2009b. Wikipedia. Altern — wikipedia, die freie enzyklopädie, 2011. URL http://de. wikipedia.org/w/index.php?title=Altern&oldid=92273683. [Online; Stand 23. August 2011]. Kai-Chiang Wu and Diana Marculescu. Joint logic restructuring and pin reordering against NBTI-induced performance degradation. In Design, Automation and Test in Europe (DATE), pages 75–80, April 2009. Lifeng Wu, Jingkun Fang, Hirokazu Yonezawa, Yoshiyuki Kawakami, Nobufusa Iwanishi, Heting Yan, Ping Chen, Alvin I-Hsien Chen, Norio Koike, Yoshifumi Okamoto, and Chune-Sin Ye. GLACIER: a hot carrier gate level circuit characterization and simulation system for VLSI design. In IEEE International Symposium on Quality Electronic Design (ISQED), pages 73–79, 2000. Michael G. Xakellis and Farid N. Najm. Statistical Estimation of the Switching Activity in Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 728 – 733, June 1994. Gary Kok-Hoo Yeap. Practical Low Power Digital VLSI Design . Springer, 1998. L. Zhang, W. Chen, Y. Hu, J. A. Gubner, and C. C.-P. Chen. Correlation-Preserved NonGaussian Statistical Timing Analysis with Quadratic Timing Model. In ACM/IEEE Design Automation Conference (DAC), 2005. 133 Bibliography Vladimir Zolotov, Jinjun Xiong, Hanif Fatemi, and Chandu Visweswariah. Statistical Path Selection for At-Speed Test. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 749 – 759, May 2010. 134 List of Figures 1.1. IC design flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2. Aging-aware timing analysis of a circuit. Aging effects degrade transistor parameter, which results in increased gate delays over time. The critical path delay increases as well and the timing specification might be violated during the specified lifetime. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1. 2.2. 2.3. 2.4. LUT-based gate model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circuit and corresponding timing graph . . . . . . . . . . . . . . . . . . Computation of the arrival time (AT). . . . . . . . . . . . . . . . . . . . Example of the incremental timing algorithm. Arrival time at red (dark grey) nodes is not valid. To update arrival time at node T, all invalid arrival times are recursively updated (dashed arrows). . . . . . . . . . . 2.5. Diagram of a sequential logic circuit. The timing constraints (setup and hold time) of a flip-flop are given as well. . . . . . . . . . . . . . . . . . 2.6. An example for calculating the branch slacks. . . . . . . . . . . . . . . . 2.7. TG with branch slacks (arc between to edges) and delays to sink (number next to the node) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8. Aged LUT-based gate model as proposed in [Chen et al., 2011]. . . . . . 2.9. Gate delay degradation as a linear function of ∆Vth . . . . . . . . . . . 2.10. Transformation of arbitrary signals into periodic signals with same signal probability and transition density. . . . . . . . . . . . . . . . . . . . . . 2.11. Drawing of an NBTI threshold voltage drift caused by consecutive stress and relaxation phases (thin black line) and the ∆Vth drift given by the long term prediction model (thick orange line). . . . . . . . . . . . . . . 3.1. 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate delay degradation to a threshold voltage drift (b). Hence, NBTI causes about 10 % degradation of the output delay for a rising input transition. 3.2. Cross section of a PMOS transistor. . . . . . . . . . . . . . . . . . . . . 3.3. Output characteristic of a PMOS transistor for altered values of ∆Vth . . 3.4. Time dependence of Vth drift due to NBTI. . . . . . . . . . . . . . . . . 3.5. Temperature dependence of ∆Vth for altered values of Vgs . . . . . . . . . 3.6. Transistor width dependence. Marked is the minimal transistor width used in the standard cell libraries. . . . . . . . . . . . . . . . . . . . . . 3.7. Drift over time for an AC stress. . . . . . . . . . . . . . . . . . . . . . . 3.8. Duty cycle dependence of NBTI. . . . . . . . . . . . . . . . . . . . . . . 3.9. Drain avalanche hot carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . 18 . 19 . 22 . 23 . 24 . 26 . 29 . 30 . 31 . 31 . . . . . 36 37 39 40 40 . . . . 41 42 43 45 135 List of Figures 3.10. Channel hot carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11. Voltage, temperature and lifetime dependence of HCI. . . . . . . . . . . . 3.12. (a) HCI equivalent circuit for a degraded transistor. VDeg and IDeg depend on ∆Ion . (b) Output characteristic of an NMOS transistor for altered values of ∆Ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13. Inverter gate and waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.14. NOR gate with two inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.15. Fan-out-3 structure: All gates in the test structure are identical to the DUT. The voltage source generates a step function. To have a realistic input signal at the DUT, the step function has to propagate through two gates before reaching the DUT. Those two gates and the DUT have to drive three gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.16. Supply voltage dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.17. Temperature dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.18. Dependence on driving strength and gate type. . . . . . . . . . . . . . . . 3.19. Dependence on transistor type and process corner. . . . . . . . . . . . . . 3.20. Dependence on input load and output slope. . . . . . . . . . . . . . . . . . 3.21. Dependence of output slope degradation on supply voltage and temperature. 3.22. Supply voltage and temperature dependence for HCI. . . . . . . . . . . . 3.23. Schematic of master-slave flip-flop. . . . . . . . . . . . . . . . . . . . . . . 3.24. Plot of sensitivities for setup and hold time. . . . . . . . . . . . . . . . . . 3.25. Sequential circuit with setup and hold time. . . . . . . . . . . . . . . . . . 3.26. (a) Change of Pshort−circuit by altering Vth . Pshort−circuit decreases for a rising and a falling input transition. (b) Subthreshold current for a PMOS transistor (with Vgs = 0 V and Vds = 1.2 V) for altered ∆Vth values. . . . 3.27. Vertical electrical field over technologies at nominal supply voltage. . . . . 3.28. Transistor drifts due to NBTI and for different technologies at nominal supply voltage (a) and at a supply voltage of 1.2 V (b). . . . . . . . . . . . 3.29. HCI over technology nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.30. Sensitivity of the inverter delay for different technologies. . . . . . . . . . 3.31. Degradation of inverter delay for different technologies and use profiles. . 4.1. Aging analysis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. An example on calculating signal probabilities . . . . . . . . . . . . . . . 4.3. Degradation of inverter delay by ∆Ion and ∆Vth , respectively. Solid lines show dependencies calculated with sensitivities and dotted lines show dependencies simulated on transistor level. Analyzing conditions are 27 ◦C, 1.2 V and 15 pF capacitive load. . . . . . . . . . . . . . . . . . . . . . . 4.4. NOR gate with three inputs . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Example explaining the signal dependence. . . . . . . . . . . . . . . . . 4.6. OR gate with three inputs and an internal signal int. . . . . . . . . . . . 4.7. Complex gate implementing the logic function z = a · (b + c). . . . . . . 136 45 46 47 47 48 50 50 51 51 52 52 53 53 54 54 55 57 58 59 59 60 60 . 65 . 68 . . . . . 71 72 73 75 79 List of Figures 4.8. Ring oscillator waveforms of fresh (leading waveform in magenta) and aged (shifted waveforms in red and blue) simulations. The transistor drifts for the aged simulations were determined once by the fresh waveform and the aged waveform. Independent of which waveform was taken to determine the drifts, the aged waveforms are almost indistinguishable . . 4.9. Frequency degradation of a 65 nm inverter ring oscillator stressed for 500 h at defined stress conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10. The five slowest output arrival times over lifetime for ISCAS’85 circuit c880. Individual workloads for the gates were obtained for SP = 0.2 and T D = 0.2 at primary inputs. Signals 866 and 874 change order with time. 4.11. Comparison of analysis with and without individual transistor drifts. . . . 5.1. TG annotated with arrival time and delay to sink at every node. . . . . 5.2. Illustration of path delay reduction step. Edge (b, d) can be removed because the delay of path P is less than the delay of path Pother . . . . . 5.3. Illustration of arrival time reduction step. Edge (d, e) can be removed because arrival time interval along edge (d, e) is less than the arrival time at e after the max-operation. . . . . . . . . . . . . . . . . . . . . . . . . 5.4. Example for the common edge reduction step. . . . . . . . . . . . . . . . 5.5. Example that shows difference between proposed and exact method for common edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. Graphical representation of the common edge reduction step cases. Edge (u, v) can be removed if aged delay of path U is smaller than fresh delay of path V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7. Path delay of an inverter chain (10 inverters) with respect to SP at the input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8. A general path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9. Graphical representation of the constraints for the gate types. . . . . . . 5.10. Basic idea for combining aging effects and process variations. . . . . . . 5.11. The dotted circles indicate the aged performances. The circuit fails because the second adder needs the result before the first adder has finished its calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12. The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is a simplified TG because each net is just represented by one node. An example of a reduced TG is shown in (b). . . . . . . . . . . . . . . . . . 5.13. Distribution of delay degradation for 1000 workload samples. The dotted line is the worst-case degradation when it is assumed that all PMOS transistors degrade maximal. . . . . . . . . . . . . . . . . . . . . . . . . 5.14. Enhanced-scan design. The standard scan design is extended by hold latches. Thereby, the first delay test vector V1 is latched by the hold latches while the second delay test vector V2 is read into the scan chain. 5.15. Path-based reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . 81 82 83 84 . 87 . 89 . 90 . 91 . 94 . 95 . . . . 97 98 100 104 . 107 . 107 . 108 . 110 . 112 137 List of Tables 2.1. Execution trace of the k most critical paths algorithm for the five slowest paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2. Comparison of state-of-the-art gate models with the proposed aging-aware gate model AgeGate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1. An example for a temperature profile. The lifetime is 10y and Vef f is Vnom . 76 4.2. Degradation of critical path delays for different analyzer settings. . . . . . 83 5.1. Minimal aged circuit delay . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Reduction ratios of nodes and edges . . . . . . . . . . . . . . . . . . . . 5.3. Comparison of the proposed approach to the approach from Baba and Mitra [2009]. Shown are the initial number of gates and paths in the circuit, the number of PCPs, the improvement in the number of PCPs of our approach compared to Baba and Mitra [2009] and the runtimes with and without considering process variations. . . . . . . . . . . . . . . . . 5.4. Number of PCPs over time of circuit c3540 . . . . . . . . . . . . . . . . . 114 . 115 . 117 . 117 B.1. Detailed results for the proposed reduction steps . . . . . . . . . . . . . . 124 139 List of Algorithms -. -. 1. -. -. 2. Function reset_node(node) . Function reset_edge(u,v) . . Circuit delay computation . Function update_node(node) Function update_edge(u,v) . k most critical paths . . . . . . . . . . . -. Function update_edge_aged(u,v) . . . . . . . . . . . . . . . . . . . . . . . 66 3. 4. 5. 6. 7. Slack reduction step . . . . . Path delay reduction step . . Arrival time reduction step . Delay to sink reduction step Common edge reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 20 21 21 25 88 89 90 91 93 141 Acronyms ASTA aging-aware static timing analysis BDD BERT binary decision diagram Berkeley reliability tools CCSM CHC CPU CSM composite current source model channel hot carrier central processing unit current source model DAG DAHC DUT DVFS directed acyclic graph drain avalanche hot carrier device under test dynamic voltage frequency scaling ECSM EDA EM effective current source model electronic design automation electromigration FF flip-flop GDSII Graphic Database System II HBD HCI HDL HLS hard break down hot carrier injection hardware description language high-level synthesis IC integrated circuit LUT look-up table MSFF master-slave flip-flop 143 Acronyms NBTI negative bias temperature instability PBTI PCP PI PO positive bias temperature instability possible critical path primary input primary output RD RTL reaction diffusion register transfer level SBD SGHC SHC SPICE SSTA STA soft break down secondary generated hot carrier substrate hot carrier Simulation Program With Integrated Circuit Emphasis statistical static timing analysis static timing analysis TA TDDB TG timing analysis time-dependent dielectric breakdown timing graph VHDL very-high-speed integrated circuits hardware description language 144 List of Symbols activation energy aged gate delay aged path delay aged timing quantity arrival time Ea daged Daged qaged AT branch slack BS clock frequency clock period clock-to-Q delay critical path critical path delay current supply voltage current temperature fCLK tCLK dCLK−to−Q Pcrit Dcrit Vcurr Tcurr degradation of drain saturation current degradation of gate delay degradation of timing quantity delay to sink drain current drain saturation current drain source voltage duty factor ∆Ion ∆d ∆q D2S Id Ion Vds DF effective supply voltage effective temperature Vef f Tef f fresh fresh fresh fresh Dcrit,f resh df resh Df resh qf resh critical path delay gate delay path delay timing quantity 145 List of Symbols gate current gate delay gate source voltage Ig d Vgs hold time tHLD input slope sIN join slack JS leakage current leakage power lifetime Ileakage Pleakage tlif e maximal circuit delay minimal aged path delay minimal circuit delay minimum transistor length Dmax Daged,min Dmin Lmin nominal supply voltage Vnom output load output slope oxide thickness CL sOU T tox parameter drift path path delay probability that transistor is “on” ∆p P D Pon required time REQT setup time short-circuit power signal probability silicon dioxide sink node slack slope tSU P Pshort−circuit SP SiO2 T SLACK s 146 List of Symbols source node stress probability stress probability HCI stress probability NBTI stress time substrate current supply voltage switching power S Pstress Pstress,HCI Pstress,N BT I tstress Isub VDD Pswitching temperature threshold voltage threshold voltage drift timing quantity transistor length transistor width transition density T Vth ∆Vth q L W TD 147 Index k most critical paths problem, 24 AgeGate, 69 aging analysis, 27 aging effect, 35 aging-aware STA, 63 arrival time, 18 block-based, 18 branch slack, 24 canonical gate model, 69 circuit delay, 18 circuit level, 15, 27 clock gating, 31 combinational circuit, 17 controlling node, 19, 24 corner case, 9, 103 critical path, 18, 85 current source model, 17 gate level, 15, 28 gate model, 12, 15, 69 gray-box model, 107 high-level synthesis, 106 hold time, 15 hot carrier injection, 44 incremental timing analysis, 18 integrated circuits, 9 interconnect network, 17 late mode, 15 layout, 15 logic synthesis, 11 module, 106 multi-stage gate, 75 negative bias temperature instability, 37 degradation equation, 39, 45, 70 delay to sink, 21, 90 drift-related aging effect, 36 operating conditions, 9 output load, 16 output slope, 63 early mode, 15 effective capacitance, 17 effective supply voltage, 40 effective temperature, 40 electromigration, 27, 35 enhanced-scan design, 109 path, 18 path delay, 18 path enumeration, 23 path-based, 18 pipelining, 22 place & route, 11 positive bias temperature instability, 44 possible critical path, 85 process variations, 9, 102 false path, 19 fan-out, 17 flip-flop, 53 gate level, 63 radiation-induced soft errors, 27 reaction diffusion model, 37 149 Index required time, 21 scan design, 109 semi-custom design flow, 11 sensitizable, 19 sequential circuit, 22, 53 setup time, 15 sign-off, 11 signal probability, 66 single input switching assumption, 16 single-stage gate, 75 sink, 18 slack, 21 slope, 16 soft errors, 27 source, 18 spacial dependence, 67 SPICE, 15 standard cell library, 16, 77 static sensitization, 98 static timing analysis, 15 statistical static timing analysis, 103 storage elements, 22, 53 stress probability, 65, 71 stress time, 70 synthesis, 15 technology mapping, 11 temporal dependence, 67 time dependent dielectric breakdown, 35 time-complexity, 18 timing analysis, 12, 15 timing arc, 15 timing graph, 17 timing quantity, 19 timing sign-off, 15 transition density, 66 transition time, 16 use profile, 59 variability, 9 workload, 64, 66 150