High-Speed Digital CMOS Circuits 73255 Summer Term 2015 Monday 8:00 – 9:30 N5320 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 1 Sumer Term 2015 Lecturer CV henzler@tum.de 2 Stephan Henzler received the Dipl.-Ing. degree in electrical engineering in 2002, the Dr.-Ing. degree in 2006, and the habilitation1 degree in 2010 from the Technische Universität München (TUM), Germany. From 2002 to 2005, he was with the Institute for Technical Electronics, Technische Universität München, where he worked on low-power digital integrated circuit design and leakage reduction techniques. For his dissertation on power management and leakage reduction techniques he received the Rhode-und-Schwarz outstanding thesis award 2007. In 2005, he joined the Advanced Systems and Circuits Department of Infineon Technologies AG, Munich, where he worked on high-speed/highperformance digital integrated circuits, variability in deepsubmicron CMOS technologies, and mixed-signal circuit design in nanometer CMOS technologies, especially timeto-digital converters. In 2010 he joined the wireless mixedsignal department of Infineon where he works on mixedsignal system and circuit design. Since February 2011 he carries on the same responsibilities within Intel. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München Sumer Term 2015 Administratives Lecture: Stephan Henzler henzler@tum.de office hours by arrangement Andrew Giebfried (teaching assistant) andrew.giebfried@tum.de Tutorials: embedded in lecture Exam: homework / self-learning module in written form, 60 minutes, after lecture cycle Language: english High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 3 Sumer Term 2015 Course Overview Logic families for high-speed and high-performance Register (flip-flop) design Clock generation and distribution – Phase/Delay Locked Loop – Frequency dividers Time-to-digital converters Arithmetic algorithms and macros for fast adders, multipliers, etc. Memory design (self learning module) High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 4 Sumer Term 2015 Outline Literature CMOS delay models – Elmore delay – Delay Minimization in buffer chain – Delay minimization of combinatorial logic Logical Effort methodology Static CMOS logic – Design considerations Dynamic Logic High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 5 Sumer Term 2015 Recommended Literature I Course Books: Rabaey, Chanddrakasan, Nikolic. Digital Integrated Circuits, A Design Perspective Weste, Harris. CMOS VLSI Design, A Circuits and System Perspective Kaeslin, Digital Integrated Circuit Design Ken, Martin. Digital Integrated Circuit Design Bernstein, Carrig, Durham, Hansen, Hogenmiller, Nowak, Rohrer. High Speed CMOS Design Styles High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 6 Sumer Term 2015 Recommended Literature II Phase-Locked Loops Razavi. RF-Microelectronics Time-to-Digital Converters: Henzler. Time-to-Digital Converters Arithmetic Circuits: Ercegovac, Lang. Digital Arithmetic Parhami. Computer Arithmetic, Algorithms and Hardware Design High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 7 Sumer Term 2015 Recommended Literature III Memory Circuits: Haraszi. CMOS Memory Circuits Low-Power: Henzler. Power Management of Digital Circuits in Deep Sub-Micron CMOS Technologies Latest material for all chapters: IEEE Xplore with TUM full library access High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 8 Sumer Term 2015 High-Speed Circuits Very high frequency, i.e. several GHz Considerable part of clock period consumed for synchronization, e.g. flip-flop delay tcpq, setup time tsetup, and clock skew plus jitter tskew Limited time for logic only simple operations per cycle or pipeline stage, respectively Be aware of hold time violations! High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 9 Sumer Term 2015 High-Performance Circuits Moderate frequency, i.e. 300MHz – 2GHz Predominant part of clock period consumed for logic operations, small synchronization overhead Powerful operations possible within a single cycle/stage Despite long cycle time the timing is critical due to the long combinatorial paths between two flip-flop stages Be aware of setup time violations! High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 10 Sumer Term 2015 Logic Design for High-Performance High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 11 Sumer Term 2015 Static CMOS Logic Complementary pull-up and pulldown network: NMOS PMOS serial parallel connection Always low resistive connection to power supply (VDD or VSS) – full swing signals – noise and leakage tolerant – strong supply dependence of delay Inputs connected to n PMOS and n NMOS devices – input load 2n (large) – large internal load connected to output High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 12 Sumer Term 2015 Static CMOS Logic 2 Handover between pull-up and pulldown during switching – cross current – medium speed – not ratioed Activity dependent power consumption Excellent modeling and EDA integration available High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 13 Sumer Term 2015 Outline Literature CMOS delay models – Elmore delay – Delay Minimization in buffer chain – Delay minimization of combinatorial logic Logical Effort methodology Static CMOS logic – Design considerations Dynamic Logic High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 14 Sumer Term 2015 Elmore Delay Prerequisites: – one input only – caps between network node and ground – no resistive loops W. C. Elmore, The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers, Journal of Applied Physics, 1948. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 15 Sumer Term 2015 Elmore Delay (cont) There is exactly one path from a network node i to the input s. The sum of all resistances along this path is the path resistance Rii, e.g. R44 = R4 + R3 + R1. W. C. Elmore, The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers, Journal of Applied Physics, 1948. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 16 Sumer Term 2015 Elmore Delay (cont) The shared path resistance Rik is the sum of all resistances along the joint sub-path of the two paths s i and s k. Example: Ri4 = R1 + R3 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 17 Sumer Term 2015 Elmore Delay (cont) Elmore delay: First order approximation of the delay after which a voltage step at the input s can be observed at the output i. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 18 Sumer Term 2015 Elmore Delay (cont) Elmore delay: quite useful for – wire delay estimation – first order delay model of static and dynamic CMOS gates (RC model) (actually a transistor is not a resistor, excellent for qualitative understanding) High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 19 Sumer Term 2015 Load Dependence of Inverter electrical effort effort, fanout,fan-out) gain (gain, Linear load-delay dependence holds fairly good, even in deep sub-micron technologies. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 20 Sumer Term 2015 Sizing of Super Buffer min. sized N - 1 unknown sizings Find inverter dimensions for minimum propagation delay. C1 and CL given N-1 variables path electrical effort High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 21 Sumer Term 2015 Sizing of Super Buffer 2 Minimize delay (i.e. search for optimum fanout hi): for optimum delay all fanouts need to be the same, i.e. h1 = h2 = h3 = … = hN High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 22 Sumer Term 2015 Sizing of Super Buffer 3 The product of all fanouts is constant and given by the constraints, i.e. C1 and CL: Minimum delay of an N-stage inverter chain (superbuffer): However, what is the optimum number of stages High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 23 Sumer Term 2015 Sizing of Super Buffer 4 Find optimum number of stages: (implicit equation for hi) normalized delay 100 H=50 H=100 H=200 80 60 40 20 0 0 1 2 3 4 5 6 number of stages High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München * Sometimes an optimal fan-out of e is reported . This follows from a similar derivation if the parasitic delay of the gate is neglected. 24 Sumer Term 2015 Sizing of Combinatorial Logic Buffer chain is mainly an academic exercise. How can we size combinatorial logic for minimum delay? How many stages shall we use to realize a certain function? Logical Effort Methodology (a generalization of the preceding investigation) High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 25 Sumer Term 2015 Delay of Combinatorial Gate effort delay p parasitic delay, depends on logic, not sizing or load h electrical effort, depends on sizing and load not on log. func. g logical effort, depends on logic, not sizing Two equivalent definitions of logical effort g: 1. gate capacitance gate cap. of ref. inverter when the gate is sized to deliver the same current than the reference inverter 2. g describes how much worse the gate can deliver current to the load compared to an inverter when the gate is sized to provide the same input capacitance as the inverter. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 26 Sumer Term 2015 Calculation of Logical Effort p = 2, g =4/3 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 27 Sumer Term 2015 Calculation of Logical Effort p = 7/3, gA = 2, gB = 2, gC = 5/3 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 28 Sumer Term 2015 Delay of High Fanin Gates Parasitic delay and logical effort of NAND gate (according to basic estimation of previous slides) inputs parasitic delay logical effort 2 3 2 3 4/3 5/3 4 4 2 5 5 7/3 6 6 8/3 8 8 10/3 n n (n+2)/3 In reality parasitic delay increases nearly quadratically due to intermediate capacitances. Use Elmore delay or simulation for accurate parameter extraction. Linear delay model still quite good. ´ 1³ 2 pN = N +N 3 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 29 Sumer Term 2015 Delay of Combinatorial Paths The branching in combinatorial blocks increases the electrical effort by the branching effort b branching effort High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 30 Sumer Term 2015 Delay of Combinatorial Path 2 determine optimum sizing in the same way than for buffers Define path effort: minimum delay can be estimated before sizing process is started! High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 31 Sumer Term 2015 Unequal Rising and Falling Delay Equal rise and fall delay is often disadvantageous for average delay (path delay is relevant for applications) and area consumption. High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 32 Sumer Term 2015 Limitations of Logical Effort Logical effort methodology does … – Valuable rule of thumb for sizing of high performance paths – Predicts the optimum path delay /wo knowledge of sizing – Indicates how to distribute the gain along a critical path Logical effort methodology does not … – Take the slope dependence of gate delays into account (however, along the critical path slopes are very similar) – Consider simultaneous switching – Consider power, i.e. gives no sizing rule for sub-critical paths – Indicate how to size a path for small power and/or area – Interconnect delay – Branching is difficult to estimate, especially for parallel critical paths High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 33 Sumer Term 2015 Non-Linear Delay Model Linear delay model is suited very well for hand calculation and intuitive understanding how to size gates Linear delay model is not suited for high numerical accuracy Non-linear delay model for computer calculation – – – – Define a set or relevant input slopes (transition times) Define a set of relevant load capacitances Perform SPICE simulation for each (load,slope) tupel Measure propagation delay and output slope (transition time) and store results in 2-dimensional lookup table – Usually stored in the so called liberty-file Numerically accurate Not useful to understand trade-offs / derive design strategies High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 34 Sumer Term 2015 Example for Timing Description in LIB File pin(Z) { … timing() { related_pin : “X1”; timing () { cell_fall(slp_load) { index_1 (“0.010, 0.050”); (slope) index_2 (“0, 10, 50”); (load) values( “50, 150, 550”\ “60, 170, 610”); … Description tables like this are done for any timing figure, i.e. – – – – delay from any input in both switching directions to the output slope at the output in response to a switching event at any input setup & hold times … High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 35 Sumer Term 2015 Outline Literature CMOS delay models – Elmore delay – Delay Minimization in buffer chain – Delay minimization of combinatorial logic Logical Effort methodology Static CMOS logic – Design considerations Dynamic CMOS Logic High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 36 Sumer Term 2015 Input Dependence of Gate Delay simultaneous switching of inputs is worst case High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 37 Sumer Term 2015 Equalization of Gate Delay Layout is more complex, i.e. cell area is larger – makes only sense if functionality requires equal propagation delay High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 38 Sumer Term 2015 Asymmetric Gates speed requirements of one input relaxed – downsizing of slow branch – upsizing of low active series devices pAd = 13/9, pAu = 13/9 pBd = 17/9, pBu = 26/9 gAd = 10/9, gAu = 10/9 gBd = 5/3, gBu = 10/3 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 39 Sumer Term 2015 Skewed Gates If one transition is much more critical than the other one the critical transition can be accelerated at the cost of the other one unskewed p = 1, g = 1 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München skewed pu = 5/6, pd = 5/3, gu = 5/6, gd = 5/3 40 Sumer Term 2015 Dynamic Logic (Precharge Logic) High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 41 Sumer Term 2015 Dynamic Logic Low input & parasitic caps. No contention No static power consumption Extremely fast Wide NOR structures e.g. for decoders High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München Sensitive to noise and leakage High dynamic power consumption Clocking required Monotonicity requirement 42 Sumer Term 2015 High Fanin Dynamic Gates Wide NOR Structures NOR operation is for free in single ended domino gate Wide OR structures cause significant leakage currents degrading the charge on the dynamic node keeper High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 43 Sumer Term 2015 Alternative for Wide NOR: Pseudo NMOS Cross current, acceptable e.g. if pulldown is exception or for high-speed applications Reduced swing (tradeoff between pullup speed and level reduction ratioed High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 44 Sumer Term 2015 High Fanin Dynamic Gates Long NAND Structures No contention Low load long NMOS pull-down chain possible but charge sharing is critical High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 45 Sumer Term 2015 Leakage Currents in Deep Sub-Micron MOSFETs: Classic Leakage Currents gate source drain 1 bulk 1 2 2 Subthreshold current Junction leakage High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 46 Sumer Term 2015 Subthreshold Leakage High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 47 Sumer Term 2015 Leakage Currents in Deep Sub-Micron MOSFETs: Tunneling Currents gate source drain 1 bulk 1 2 2 Gate tunneling current Gate induced drain leakage High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 48 Sumer Term 2015 Noise and Leakage Sensitivity Leakage currents discharge dynamic node limited retention time minimum operation frequency (very disadvantageous for production test or low speed operation modes) Noise on power and signal wires opens pull-down paths weakly erroneous discharge of dynamic node High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 49 Sumer Term 2015 Reduction of Noise & Leakage Sensitivity Weak keeper device compensates for leakage and noise induced discharge currents Size keeper for approximately 10% of discharge current 5-10 % speed degradation No inversion High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 50 Sumer Term 2015 Designing Weak Keepers weak keeper small W / L – ratio small W, large L? good device properties but large current small current but strongly sensitive to variations High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München small current but modeling of length dependence is difficult good keeper device 51 Sumer Term 2015 Design of Weak Keepers 2 good keeper device but large output loading High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München good keeper with reduced output loading 52 Sumer Term 2015 Adaptive Keepers With increasing process variations keeper design becomes difficult slow NMOS & fast PMOS: – keeper too strong – significant speed degradation fast NMOS & slow PMOS: – high leakage in pull-down path but small compensation current keeper too weak – erroneous discharge Steven Hsu, Intel, ISSCC 2006 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 53 Sumer Term 2015 Delayed Keeper Increasing leakage calls for stronger keepers delay penalty, advantage of dynamic circuits vanishes Concept: – Use small keeper which cannot compensate leakage completely – Enable strong keeper after evaluation/discharge is completed Challenge: Size permanent and delayed keeper such that leakage currents do not compromise logical decision High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 54 Sumer Term 2015 Charge Sharing in Dynamic Gates Charge sharing between cap of pre-charged node and intrinsic caps Eventually undefined levels and disturbance of subsequent stages Might be recovered by keeper Can be easily overlooked in simulation think about worst case situation Remedy: Precharge internal nodes with weak transistors High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 55 Sumer Term 2015 Multi-Output Dynamic Logic Domino gates can produce multiple logic functions (with common subterms) simultaneously High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 56 Sumer Term 2015 Compound Domino Logic Coupling inverters can be substituted by any static gate to reduce number of logic stages High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 57 Sumer Term 2015 Conditional Keeper Strongly low skewed CMOS gates with precharge Reduced contention No latching Also known as skewed CMOS High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 58 Sumer Term 2015 Clocking of Single-Rail Domino Circuits Sequential activation of logic stages High noise sensitivity High speed optimized clock skews variation sensitive Circuit becomes somehow “analog” High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 59 Sumer Term 2015 Clocking of Single-Rail Domino Circuits 2 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 60 Sumer Term 2015 Clocking of Single-Rail Domino Circuits 3 Self timed evaluation (domino principle) Simultaneous pre-charge / evaluation Bypassing of stages possible High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 61 Sumer Term 2015 Clocking of Single-Rail Domino Circuits 4 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 62 Sumer Term 2015 Clocking of Footer-Less Domino Circuits No footer speed & power improvement Self timed Sequential pre-charge to avoid cross currents High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 63 Sumer Term 2015 Clocking of Footer-Less Domino Circuits 2 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 64 Sumer Term 2015 NORAce Domino Logic Alternating NMOS / PMOS Domino Gates pre-charged state disables all evaluation paths Self timed Noise sensitive No direct bypassing High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 65 Sumer Term 2015 NORAce Domino Logic 2 High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 66 Sumer Term 2015 Cross Coupled Domino – A Dynamic Dual Rail Family – No contention Improved robustness Implicit inversion Higher clock load Complex wiring No wide NOR structures High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 67 Sumer Term 2015 Is it a good idea to use dynamic logic? Well, it‘s fancy Performance advantage vanishes in DSM technologies Many design pitfalls, e.g. charge sharing, leakage, noise Very susceptible to parasitics, PVT, etc. Weak EDA support, e.g. timing verification, poor verification Conclusion – – – – Dynamic logic is a risk – Say No-No! Dynamic logic is often the reason for redesigns Avoid whenever possible If you think its required, first seek for architectural loopholes, e.g. logic optimization, pipelining, parallelization … If you find no other way do it, but very carefully! High-Speed Digital CMOS Circuits Stephan Henzler Technische Universität München 68 Sumer Term 2015