I NV E NT IV E CONFIDENTIAL Digital Circuit Synthesis Methodology Design with RTL compiler for smaller,faster,cooler chips Zinger Liu Customer Support SourceLink Online Customer Support sourcelink.cadence.com Search the solution database and the entire site. Access all documentation. Find answers 24x7. If you don’t find a solution on the SourceLink site... If you have a Cadence® software support service agreement, you can get help from SourceLink® online customer support. The web site gives you access to application notes, frequently asked questions (FAQ), installation information, known problems and solutions (KPNS), product manuals, product notes, software rollup information, and solutions information. Submit a service request online. Customer Support Online Form From the SourceLink web site, fill out the Service Request Creation form. Service Request If your problem requires more than customer support, then it is escalated to R&D for a solution. R&D 2 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 1 Cadence Users Group • • 3 http://www.cdnusers.org/ Select Digital IC to view solutions and recommendations from other Cadence users on tools and methodologies. 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Agenda • • • • • 4 Logic Synthesis Introduction Basic of Static Timing Analysis Low Power Technology in Synthesis Design For Test (DFT) Verification (Formal equivalence checking using Conformal) 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 2 Logic Synthesis Introduction 5 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only I NV E NT IV E Top-Down / Bottom-Up Design Flow CONFIDENTIAL Algorithm, System and High-Level Synthesis Logic Synthesis Transistor-Level Synthesis Physical Design http://ens.ewi.tudelft.nl/Education/courses/et4255/slides/01_introduction.pdf 3 Synthesis Flow High-Level Synthesis Logic Synthesis Physical Design Fabrication and Packaging 7 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Figures adopted with permission from Prof. Ciesielski, UMASS High-Level & Logic Synthesis a multi-stage process Specification Logic Extraction module example(clk, a, b, c, d, f, g, h) input clk, a, b, c, d, e, f; Technology-Independent Optimization aoutput g, h; reg g, h; b a Technology-Dependent Mapping h clk) begin g1 ealways @(posedge 0 g = a | b; G g0 bif (d) begin if (c) h = a&~h; f else h = b; h5 G if (f) g = c; else a^b; dc end else g h3 bd if (c) h = 1; else h ^b; H end e fendmodule ae c c d h1 H g h clk f clk 8 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Slides adopted with permission from Prof. Ciesielski, UMASS 4 Physical Design (Synthesis) Circuit Design Partitioning Floorplanning & Placement Routing Compaction Fabrication 9 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Slides adopted with permission from Prof. Pan, UT-Austin Design Styles Design Styles Full-Custom Semi-Custom Standard-Cell Gate Arrays FPGA • Can control the shape of all mask patterns • Can specify design as low as transistor level 10 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 5 Design Styles – Full Custom 11 2007年6月19日星期二 Mask layout of the Intel 486 microprocessor chip Cadence Confidential: Cadence Internal Use Only http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch01/ch01.html Design Styles – Standard Cell 12 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only http://lsiwww.epfl.ch/LSI2001/teaching/webcours 6 Design Styles – Gate Arrays 13 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch01/ch01.html Design Styles – FPGAs 14 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch01/ch01.html 7 Design Styles Trade-offs 15 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch01/ch01.html Today’s Design Trends Source: Prof. Rabay, UCBerkeley K Transistors 90nm 0.13um 0.18um 0.25um 0.35um 0.5um 1,000,000 100,000 10,000 1,000 0.8um 1um 100 1.5um 2um 3um 10 1 Time-To-Market Fabrication Cost 120% $10000 100% $1000 $Million Profit 80% 60% 40% $100 20% $10 0% 0 -20% 16 2007年6月19日星期二 3 6 9 12 15 $1 Months Late Cadence Confidential: Cadence Internal Use Only Source: MIPS Technologies 1960 1970 1980 1990 2000 2010 Source: www.icknowledge.com 8 Design of Integrated Systems System Level Gate Level Transistor Level Verification Design Register Transfer Level Layout Level Mask Level 17 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only System Level • Abstract algorithmic description of high-level behavior – e.g. C-Programming language Port* compute_optimal_route_for_packet(Packet_t *packet, Channel_t *channel) { static Queue_t *packet_queue; packet_queue = add_packet(packet_queue, packet); ... } – abstract because it does not contain any implementation details for timing or data – efficient to get a compact execution model as first design draft – difficult to maintain throughout project because no link to implementation 18 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 9 RTL Level • Cycle accurate model “close” to the hardware implementation – bit-vector data types and operations as abstraction from bit-level implementation – sequential constructs (e.g. if - then - else, while loops) to support modeling of complex control flow module mark1; reg [31:0] m[0:8192]; reg [12:0] pc; reg [31:0] acc; reg[15:0] ir; always begin ir = m[pc]; if(ir[15:13] == 3b’000) pc = m[ir[12:0]]; else if (ir[15:13] == 3’b010) acc = -m[ir[12:0]]; ... end endmodule 19 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Gate Level • Model on finite-state machine level – models function in Boolean logic using registers and gates – various delay models for gates and wires 1ns 4ns 3ns 5ns – in this lecture we will mostly deal with gate level 20 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 10 Transistor Level • Model on CMOS transistor level – depending on application function modeled as resistive switches • used in functional equivalence checking – or full differential equations for circuit simulation • used in detailed timing analysis 21 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Layout Level • Transistors and wires are laid out as polygons in different technology layers such as diffusion, poly-silicon, metal, etc. 22 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 11 Relative Effort Design of Integrated Systems - Design phases overlap to large degrees - Parallel changes on multiple levels, multiple teams - Tight scheduling constraints for product Logic RTL Transistor System Project Time 23 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design Challenges • Systems are becoming huge, design schedules are getting tighter – > 100 Mio gates becoming common for ASICs – > 0.4 Mio lines of C-code to describe system behavior – > 5 Mio lines of RLT code • Design teams are getting very large for big projects – – – – • several hundred people differences in skills concurrent work on multiple levels management of design complexity and communication very difficult Design tools are becoming more complex but still inadequate – typical designer has to run ~50 tools on each component – tools have lots of bugs, interfaces do not line up etc. 24 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 12 Design Challenges • Decision about design point very difficult – compromise between performance / costs / time-to-market – decision has to be made 2-3 years before design finished – design points are difficult to predict without actually doing the design – scheduling of product cycles • Functional verification – simulation still main vehicle for functional verification but inadequate because of size of design space – results in bugs in released hardware that is very expensive to recover from (different in software ;-) 25 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design Challenges • Fundamental tradeoffs between different modeling levels: – modeling detail and team size to maintain model • high-level models can be maintained by one or two people • detailed models need to be partitioned which results in a significant communication overhead – modeling accuracy versus modeling compactness • compact models omit details and give only crude estimations for implementation • detailed models are lengthy and difficult to adopt for major changes in design points – simulation speed versus hardware performance • high-level models can be simulated fast but cannot be implemented efficiently with automatic means • low-level models can be made to have a fast implementation but cannot be simulated very fast 26 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 13 General Design Approach • How do engineers build a bridge? • Divide and conquer !!!! – partition design problem into many sub-problems which are manageable – define mathematical model for sub-problem and find an algorithmic solution • beware of model limitations and check them !!!!!!! – implement algorithm in individual design tools, define and implement general interfaces between the tools – implement checking tools for boundary conditions – concatenate design tools to general design flows which can be managed – see what doesn’t work and start over 27 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design Automation • Design Automation is one of the most advanced areas in practical computer science – many problems require sophisticated mathematical modeling – many algorithms are computationally hard and require advanced and fine-tuned heuristics to work on realistic problem sizes – boundary conditions need to be well declared and synchronized between different tools (patchwork to cover all wholes) • Two common pitfalls in CAD research – problem is looking for a solution: • problem scope is too big, makes modeling difficult or algorithms don’t scale • problem scope is too small, solutions are not good enough – solution is looking for a problem: • model was oversimplified because real problem was too complex with too many boundary conditions 28 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 14 Key to Success • Fine-tuned combination of Design Methodology and Tools – addresses algorithmic complexity by requiring • manual partitioning of the problem • manual input of hints/suggestions • manual iterations to drive tool application to best solution – makes CAD systems and design flows very complex and difficult to manage Problem space Tools applicable Practical combination through design methodology 29 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Examples of Divide and Conquer • RLT cycle simulation does only evaluate the next state logic of the circuits, timing is assumed to be correct – combination of static timing analysis, formal equivalence checking, and cycle simulation allows separation of issues – cycle simulation avoids expensive event scheduling and processing and performs significantly faster • However: – timing analysis is conservative with respect to the achievable clock cycle time 30 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 15 Examples of Divide and Conquer • Static timing analysis assumed simple gate delay models – complexity of static timing analysis becomes linear (simple longest and shortest paths analysis in circuit implementation) – very efficient implementation of incremental static timing analysis which is needed in the inner loop of the technology dependent part of logic synthesis • However: – actual gate delay varies a lot in reality • models often assume average fan-out rather than actual gate load – delay model assumes ideal signals • slew dependency ignored 31 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Examples of Divide and Conquer • Logic synthesis assumes ideal gates which are independent of physical environment – standard cell place and route technology has made logic synthesis possible • gates are heavily over-designed to be functional in a wide variety of combinations (e.g. range of fan-out gates possible, different wire loads • layout placement and route done in standard rows that minimize latch-up effects and optimize power and clock wiring • However: • layout implementation remains sub-optimal because cells are designed for worst case application and with large safety margins with respect to environment 32 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 16 Examples of Divide and Conquer • Logic synthesis uses crude model to estimate circuit area • literal count or simple table-lookup for gates sizes allows fast comparison of different implementation choices • However: • actual gate size can vary to a very large degree depending on load and timing requirement • area for wiring completely ignored 33 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Examples of Divide and Conquer • Formal equivalence checking assumes identical state encoding of the two designs to be compared – reduces the general equivalence checking problem to combinational equivalence checking which is computationally less complex – exploitation of structural similarities between designs to be compared makes tools applicable for huge (multi-million gate) designs – automatic algorithms for identifying register correspondence compensate to some extent for limited model • However: – combinational verification model cannot handle sequential verification problems 34 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 17 Full Custom Design Flow • Application: ultra-high performance designs – general-purpose processors, DSPs, graphic chips, internet routers, games processors etc. • Target: very large markets with high profit margins – e.g. PC business • Complexity: very complex and labor intense – involving large teams – high up-front investments and relatively high risks • Role of Logic Synthesis: – limited to components that are not performance critical or that might change late in design cycle (due to designs bugs found late) • control logic • non-critical data paths logic – bulk of data-path components and fast control logic are manually crafted for optimal performance 35 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Full Custom Design Flow • Incomplete picture: Logic Synthesis ISA Specification Simulation RTL Spec Simulation Gate Level Netlist Transistor Level Circuit Layout Manual or semisemi-automatic Design 36 2007年6月19日星期二 Formal Equivalence Checking Circuit Simulation Extract&Compare Design Rule Checker Cadence Confidential: Cadence Internal Use Only 18 ASIC Design Flow • Application: general IC market – peripheral chips in PCs, toys, handheld devices etc. • Target: small to medium markets, tight design schedules – e.g. consumer electronics • Complexity of design: standard design style, quite predictable – standard flows, standard off-the-shelf tools • Role of Logic Synthesis: – used on large fraction of design except for special blocks such as RAM’s, ROM’s, analog components 37 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only ASIC Design Flow • Incomplete picture: Logic Synthesis Informal Specification RTL Spec Gate Level Netlist Modifies Gate Level Netlist Manual Changes to fix timing 38 2007年6月19日星期二 Simulation Formal Equivalence Checking Static Timing Analysis Test Logic Insertion ASIC Foundry Cadence Confidential: Cadence Internal Use Only 19 What is Logic Synthesis? X D Given: FiniteFinite-State Machine F(X,Y,Z, , ) where: Y X: Input alphabet Y: Output alphabet Z: Set of internal states : X x Z Z (next state function) : X x Z Y (output function) Target: Circuit C(G, W) where: G: set of circuit components g {Boolean gates, flipflip-flops, etc} W: set of wires connecting G 39 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Typical Synthesis Scenario RTL to Network Transformation - Read HDL - Control/data flow analysis Technology Independent Optimizations - Basic logic restructuring - Crude measures for goals Technology Mapping - Use logic gates from target cell library Technology Dependent Optimizations - Timing optimization - Physically driven optimizations Test Preparation 40 2007年6月19日星期二 - Improve testability - Test logic insertion Cadence Confidential: Cadence Internal Use Only 20 Objective Function for Synthesis • Minimize area • Minimize power – in terms of literal count, cell count, register count, etc. – in terms of switching activity in individual gates, deactivated circuit blocks, etc. • Maximize performance – in terms of maximal clock frequency of synchronous systems, throughput for asynchronous systems • Any combination of the above – combined with different weights – formulated as a constraint problem • “minimize area for a clock speed > 300MHz” • More global objectives – feedback from layout • actual physical sizes, delays, placement and routing 41 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Constraints on Synthesis • Given implementation style: – two-level implementation (PLA, CAMs) – multi-level logic – FPGAs • Given performance requirements – minimal clock speed requirement – minimal latency, throughput • Given cell library – set of cells in standard cell library – fan-out constraints (maximum number of gates connected to another gate) – cell generators 42 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 21 Instability of Logic Synthesis Experiment to write out netlist in middle of synthesis run and read back in w/o change Change in area and performance (15 testcases and 20 libraries) 6 4 2 0 -40 -30 -20 -10 0 10 20 30 -2 -4 -6 area (%) 43 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Brief History of Logic Synthesis • 1960s: first work on automatic test pattern generation used for Boolean reasoning – D-Algorithm • 1978: Formal Equivalence checking introduced at IBM in production for designing mainframe computers – SAS tool based on the DBA algorithm • 1979: IBM introduced logic synthesis for gate array based main frame designed • End 1986: Synopsys founded – LSS, next generation is BooleDozer – first product “remapper” between standard cell libraries – later extended to full blown RTL synthesis • 1990s other synthesis companies enter the marker • 2000s Global Synthesis, Get2chip (part of Cadence now) – Ambit, Compass, Synplicity. Magma, Monterey, ... 44 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 22 Why learning about Logic Synthesis? • Logic synthesis is the core of today's CAD flows for IC and system design – course covers many algorithms that are used in a broad range of CAD tools – basis for other optimization techniques, e.g. embedded software – basis for functional verification techniques • Most algorithms are computationally hard – covered algorithms and flows are good example for approaching hard algorithmic problems – course covers theory as well as implementation details – demonstrates an engineering approaches based on theoretical solid but also practical solutions • very few research areas can offer this combination 45 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Basic of Static Timing Analysis 46 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 23 The Timing Path • All synthesized designs are broken down into timing paths • Each timing path has a startpoint and an endpoint. • There are only two types of startpoints in a design: – the input port of a design or the clock pin of a sequential cell • There are only two types of endpoints: – the data input pin of a sequential cell or the output port of a design Input to Clk (I2C) Paths 47 2007年6月19日星期二 Clk to Clk (C2C) paths Clk to Output (C2O) path Cadence Confidential: Cadence Internal Use Only How Many Timing Paths are There? einundzwanzig 48 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 24 How Do You Time Timing Paths? • Simple, just add up the timing arcs in the path • Two types of timing arcs: – cell delay timing arcs – net delay timing arcs • Cell delay timing arcs are defined by the type of cell delay model used in the technology library • Net delay timing arcs are defined by the operating conditions, which specify the type of resistance-capacitance (RC) tree model to use 2 1 1 0 3 1 4 0 4 time = 2 + 1 + 1 + 3 + 0 + 4 + 1 + 4 + 0 = 16 time units 49 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Understanding Cell Delay • Delay through a cell is often determined by the cell’s intrinsic delay (internal delay), load that it is driving, and input transition (slew) Transition (slew) is the time it takes for the pin to change state • Y A A Propagation Delay (inverting) Propagation Delay (non-inverting) Voltage Voltage Vmax Input Signal Vmin 50 Y 2007年6月19日星期二 Output Signal 50% Vmax 90% 10% Cell Delay Slew Time Vmin Input Signal 50% Output Signal 90% 10% Cell Delay Slew Time Cadence Confidential: Cadence Internal Use Only 25 Cell Delay Timing Models • • • • Most standard cell libraries use a Table Lookup timing model Table lookup models have the accuracy of spice simulations, require only a few input variables, and are very fast to calculate TLU’s work with 2 inputs: the input transition rate of the signal at the input pin and the total capacitance (pin + net) that the cell has to drive Each cell has two TLU models: one generates the cell delay and the other generates the cell’s output transition rate (which is used as the next cell’s input transition rate) cell delay input transition 51 2007年6月19日星期二 cell delay output transition Ctotal output transition Ctotal Cadence Confidential: Cadence Internal Use Only Understanding Input Slew • Measure of transition rate – Time to go from one threshold point to the other – Thresholds are usually defined as a certain percentage of the voltage swing 10-90%, or 20-80% • The waveform in Fig. 1 can be described as having a rise time of xxx measured from 10-90% of Voltage, or yyy as measured from 20-80% of Voltage Voltage Voltage Slew VH VTh2 VTh1 VL Slew VH VTh2 Rising Signal VTh1 VL Falling Signal Time Fig. 1 52 2007年6月19日星期二 Time Fig. 1 Cadence Confidential: Cadence Internal Use Only 26 Understanding Net Delay • Interconnect causes the timing arc to be from pin to pin • These net delays are computed using wireload model estimation, or are computed using back-annotated delay information if available ^ -> ^ A Y Y A v -> v ^ -> ^ v -> v 53 2007年6月19日星期二 50% 50% Net Delay 50% Net Delay 50% Cadence Confidential: Cadence Internal Use Only Net Delay Timing Models • Net delay models use resistance*capacitance (RC) models defined by the operating conditions (chosen by the user) • Three most common models: best case, balanced tree and worst case • Best Case: R*C is calculated at output of cell: therefore no net resistance value, thus Best case net delay = 0 (Rnet*Ctotal = 0) • Balanced Tree: RC delay is evenly divided by fanout, thus Balanced Tree net delay = (Rnet*Ctotal)/netFanout • Worst case: each branch of a net sees the full R & C values, thus Worst case net delay net delay= Rnet * Ctotal Rnet 54 2007年6月19日星期二 Ctotal Cadence Confidential: Cadence Internal Use Only 27 Ctotal and Rnet • Ctotal comprises of the net load and the total pin load. • The net load comes from the wire load model • The pin load is a summary of the capacitance of every pin of every cell that the net is connected to (its data comes from the .lib technology library file) • Rnet comes from the wire load model Ctotal = Cnet + Sum of (Cpins) Rnet 55 2007年6月19日星期二 Cnet Cpin Cadence Confidential: Cadence Internal Use Only Wire Load Models (WLMs) • • • • • Wire load models provide synthesis tools with an early estimation of a net’s capacitance and resistance. Wire load models are a statistical average of a net’s R & C value based on length and fanout. Because different sized designs can have the same fanout, but dramatically different lengths for a given fanout, there are usually lots of wire load models for any given technology Each model represents a different size of integration (i.e. for each design size, there is a different wire load model) WLMs are usually statistically “weighted” such that the value reported by the WLM for capacitance and resistance is pessimistic # nets length given by WLM 90% length of net Distribution of Number of Nets vs. Length for a fixed size design and Fanout = 1 56 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 28 Discussion • Are wire load models GOOD or BAD? Why? 57 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Add in Synchronous Effects • • • Timing paths are synchronous, therefore data always arrives at the startpoint relative to a clock and is captured at the endpoint relative to a clock Timing paths start with either a synchronous input delay or a clk-toQ timing arc. Timing paths end with either a setup requirement relative to a clock or a synchronous output delay (which acts just like a setup requirement but the sequential cell is out of the picture) Design Being Constrained external design external design clk Input Delay 58 2007年6月19日星期二 Setup Clk-to-Q Cadence Confidential: Cadence Internal Use Only i2c c2c Setup Clk-to-Q Output Delay c2o 29 Quiz • What signal triggers the input delay at Port A? • Which one signal defines the overall amount of time data has to go from RegA to RegB? • Why is the output delay considered an external setup requirement? Design Being Constrained Ext_RegC RegA A RegB clk clk clk Output Delay Input Delay 59 Ext_RegD B 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Understanding Setup & Hold Times • Setup and hold checks are the most common types of timing checks used in timing verification – Synchronous inputs (e.g. D) have Setup, Hold time specification with respect to the CLOCK input – These checks specify that the data input must remain stable for a specified interval before and after the clock input changes • Setup Time: the amount of time the synchronous input (D) must be stable before the active edge of clock • Hold Time: the amount of time the synchronous input (D) must be stable after the active edge of clock. Setup time Clock Hold time 60 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 30 Understanding Setup Times • Setup Time: the amount of time - this is specified in the library - the synchronous input (D) must show up, and be stable before the capturing edge of clock. This is so that the data can be stored successfully in the storage device • Setup violations can be remedied by either slowing down the clock (increase the period) or by decreasing the delay of the data path Cycle 1 Cycle 2 Clock Setup time Data from previous cycle Q1 New Data from current cycle Max Delay of slow logic Data from previous cycle D2 New Data from current cycle Setup Violation 61 2007年6月19日星期二 Q1 Slow D2 Logic FF1 Data arrives late Source FF2 Target Cadence Confidential: Cadence Internal Use Only Setup Time Violations – Slow Data Clock T-setup Q1 D2 Data from previous cycle Data from previous cycle FF1 T-hold New Data from current cycle Source Q1 slow D2 Logic FF2 Target New fromLOST NEWData DATA current cycle • What happens when Q1 data is slow to arrive • Setup Time is Violated • New Data is Lost 62 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 31 Setup Time Violations – Fast Clock FF1 T-setup T-setup T-setup T-setup T-setup T-hold T-hold T-hold T-hold Q1 D2 Q1 Clock Data from previous cycle New Data from current cycle Data Data from from previous previous cycle cycle D2 D2 FF2 Fast Clock Source Target New Data fromLOST NEW DATA current cycle • What happens when we speed up the clock? • Setup Time is Violated • New Data is Lost 2007年6月19日星期二 63 Cadence Confidential: Cadence Internal Use Only Understanding Hold Times • Hold Time: the amount of time - this is specified in the library - the synchronous input (D) stays long enough after the capturing edge of clock so that the data can be stored successfully in the storage device. • Hold violations can be remedied by increasing the delay of the data path or by decreasing the clock uncertainty if specified in the design. Hold time Clock Q1 D2 Data from previous cycle New Data from current cycle Data from previous cycle New Data from current cycle FF1 Source Q1 fast D2 Logic FF2 Target New Data Hold Violation arrives early 64 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 32 Hold Time Violations – Fast Data Change Input Changed Clock T-setup Q1 Data from previous cycle Data Data from from previous previous cycle cycle D2 D2 FF1 T-hold New Data from NewNew Data New Data New from Data from Data fromfrom current cyclecurrent current current cycle current cycle cycle cycle Q1 fast D2 Logic FF2 Target Source New Data fromLOST NEW DATA current cycle • What happens when Q1 data starts changes immediately • Hold Time is Violated • New Data is Lost eventhough it is caught within clock edge 65 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Hold Time Violations – Clock Skewing Clock_1 Q1 FF1 Clock_2 T-setup T-setup T-setup T-setup T-setup T-hold T-setup T-hold T-setup T-hold T-setup T-hold T-hold T-hold Q1 D2 D2 Data from previous cycle Data Data from from previous previous cycle cycle New Data from current cycle Clock_1 Source D2 FF2 Clock_2 Target (skewed) NewNew Data New Data from Data from from NEW DATA LOST current current current cycle cycle cycle • What happens when clock2 comes in skewed? • Hold Time is Violated • New Data is Lost 66 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 33 Review Timing Path Types Input to Reg Reg to Reg IN_1 OUT_1 FF1 IdlClk FF2 IN_2 67 2007年6月19日星期二 Reg to Output Input to Output OUT_2 Cadence Confidential: Cadence Internal Use Only Understanding Different Timing Paths • Register to Register – Requires User to define the clocks • Input port to Register – Requires User to Set the data arrival time at the port • Register to Output port – Requires User to Set the port external delay • Input port to Output port – Requires User to properly budget timing using synchronous input & output delays Note: Examples and equations in this section assume same clock for start/end points with no insertion delay, etc. 68 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 34 Understanding Timing Paths: Reg to Reg • To meet setup checks for single-cycle: – clk --> q + comb delay =< clock period - reg setup • To meet hold checks for single-cycle: – clk --> q + comb delay >= hold check(0) + reg hold Gate + Wire delay D Q Clock to Q _ C Q Block Being Constrained Combo logic Setup Time D Q C _ Q Clock Period 69 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Understanding Timing Paths: Input to Reg • • • Input Delay = clk --> q t comb1 Setup Requirement: Input_delay + comb2 =< clock period - reg setup Hold Requirement: Input_delay + comb2 >= hold check(0) + reg hold Outside World Block Being Constrained Gate + Wire delay Input Delay Clock to Q 0 70 D Q C _ Q Clock Root 2007年6月19日星期二 Combo 1 Gate + Wire delay Combo 2 Setup Time D Q C _ Q Clock Period Cadence Confidential: Cadence Internal Use Only 35 Understanding Timing Paths: Reg to Output • • • • External_delay = comb delay + setup External_delay = comb delay - hold time Setup Requirement : Clk --> q + comb1 =< clock Period - external_delay Hold Requirement: Clk --> q + comb1 >= hold check(0) - external_delay Block Being Constrained Outside World External Delay Clock to Q D Q C _ Q Comb1 Gate + Wire delay Setup Gate + Time D Q C _ Q Wire delay 0 Clock Root 71 2007年6月19日星期二 Clock Period Cadence Confidential: Cadence Internal Use Only Understanding Timing Paths: Input Port to Output Port • Input delay and output delay are set with respect to a clock – Default single-cycle – Setup requirement: • Comb delay < clock period - input delay - external delay – Hold requirement: • Comb delay > clock period - input delay - external delay • Combinational paths have no clocks defined for the module – Setup requirement: • Comb delay =< (delay set with set_path_delay_constraint -late) input_delay -external_delay – Hold requirement: • Comb delay >= (the delay set with set_path_delay_constraint -early) - input_delay - external_delay 72 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 36 Understanding Slack • Slack is generally defined as the difference between the required times and arrival times at an end point. • RC optimizes for setup slack (hold time optimization not done) • The equation is different for the different types of paths: time period = 2nd active clock edge – 1st active clock edge • C2C: slack = time period – combo_logic_delay – clk_to_q – setup • I2C: slack = time period – combo_logic_delay – input_delay – setup • C2O: slack = time period – clk_to_q – combo_logic_delay – output_delay 73 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Reading a Timing Report • • • • 74 The timing report is broken down into 3 sections: section 1lists the arrival time of a path, section 2 lists the capture time (required time) and section 3 lists the summary (slack) The arrival time includes any input delay, the startpoint clock’s active edge, the combo logic delay and the setup time or output_delay for the endpoint The capture time section lists the arrival time of the capture clock (the endpoint clock), any uncertainty, any latency and if any additional margin has been added/subtracted to the path The slack section lists the startpoint and endpoints of the path as well as the slack for the path. 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 37 Timing Report Example 75 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only I NV E NT IV E CONFIDENTIAL Basic SDC 38 What is SDC? • Synopsys Design Constraints • Describes “Design Intent” – Sorta like a standard for assertions on timing • Used for over 10+ years since the popularity of DC • Every tool dealing with timing and timing optimizations reads or generates SDC files – As ubiquitous as a verilog netlist format – Can also be generated by different intermediate tools • Primetime, RC, DC, FE, Magma BlastChip – Conforms to tcl syntax - 2007年6月19日星期二 77 Cadence Confidential: Cadence Internal Use Only Synopsys Design Constraints • Operating conditions – • • set_drive set_driving_cell set_fanout_load set_input_transition set_load set_port_fanout_number • set_max_capacitance set_max_fanout set_max_transition 2007年6月19日星期二 • create_clock create_generated_clock set_clock_latency set_clock_transition set_clock_uncertainty set_disable_timing set_input_delay set_max_time_borrow set_output_delay set_propagated_clock Timing exceptions – – – Design rule constraints – – – 78 set_wire_load_mode set_wire_load_model set_wire_load_selection_group Environmental constraints – – – – – – Timing constraints – – – – – – – – – – Wire load models – – – • • set_operating_conditions set_false_path set_max_delay set_multicycle_path Power constraints – – set_max_dynamic_power set_max_leakage_power Cadence Confidential: Cadence Internal Use Only 39 8 Basic Design Constraints • There are eight basic (required) design constraints. • There are (in SDC format): – – – – – – – – 79 create_clock, set_clock_uncertainty, set_input_delay, set_output_delay, set_load, set_driving_cell, set_operating_conditions, set_wire_load_model 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design Objects • Accessing different parts of the design 80 Design current_design A container for cells. A block. Clock get_clocks all_clocks A clock in a design All clocks in a design Port get_ports all_inputs all_outputs An entry point to or exit point from a design All entry points to a design All exit points from a design Cell get_cells An instance of a design or library cell Pin get_pins An instance of a design port or library cell pin Net get_nets A connection between cell pins and design ports. Lib_cell get_lib_cells A primitive logic element 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 40 Understanding Clock Period • • • Periodic Waveform Clock period (a.k.a cycle-time ) Edges Clock Period ( cycle-time ) 2 0 Trailing Edge 4 Leading Edge Pulse-width high 81 2007年6月19日星期二 Pulse-width low Cadence Confidential: Cadence Internal Use Only Understanding Duty Cycle • Ratio of pulse-width-high / pulse-width-low 0 2 4 Pulse-width low Pulse-width high 50% 0 1 4 Pulse-width low 33.3% 82 2007年6月19日星期二 Pulse-width high Cadence Confidential: Cadence Internal Use Only 41 Understanding Rising / Falling Edge Triggered Sequential Cells • When a clock waveform is associated with a clock port Rising Edge Falling Edge 0 2 Trailing Edge 4 Leading Edge Leading Edge 0 2 4 Trailing Edge Rising Edge Falling Edge 2007年6月19日星期二 83 Cadence Confidential: Cadence Internal Use Only create_clock • create_clock -period period_value [-name clock_name] [-waveform edge_list] [-add] [ source_objects] Creates a Clock create_clock –name “PHI1” –period 10 –waveform {0.0 5.0} 0 5 10 15 20 25 create_clock –name “PHI1” –period 10 –waveform {0.0 9.0} 0 5 9 10 15 create_clock -name "clk2" -period 10 -waveform {2.0 4.0} {clkgen1/Z clkgen2/Z clkgen3/Z} 19 20 25 clkgen1/z clkgen2/z clkgen3/z 84 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 42 Understanding Clock Insertion Delay • Source Insertion Delay = set_clock_latency –source – Delay from clock source to beginning of clock tree • Network Insertion Delay ( clock tree delay ) = set_clock_latency Ideal CLK Source Insertion Delay 85 2007年6月19日星期二 Network Insertion Delay Cadence Confidential: Cadence Internal Use Only set_clock_latency • • • also called insertion delay, defines time it takes a clk signal to propagate from the clk definition point to a reg clk pin. For generated clocks command can be used to model the delay from master-clock to generated clock definition point. Most tools assumes ideal clocking, which means clocks have a specified network latency of zero by default set_clock_latency [-rise] [-fall] [-min] [-max] [-source] [-late] [-early] delay object_list set_clock_latency 1.2 -rise [get_clocks CLK1] set_clock_latency 0.9 -fall [get_clocks CLK1] Ideal CLK1 CLK1 1.2 ns 0.9 ns set_clock_latency 0.8 -source -early [get_clocks CLK1] set_clock_latency 0.9 -source -late [get_clocks CLK1] CLK1 CLK1 0.8 ns 86 2007年6月19日星期二 0.9 ns Cadence Confidential: Cadence Internal Use Only 43 Understanding Clock Uncertainty • • • From cycle to cycle, the period and duty-cycle can change slightly due to clock generation circuitry This clock jitter (also known as interclock jitter) can be modeled by adding uncertainty regions around the rising and falling edges of the clock waveform Clock uncertainty is the time difference between the arrival of clock signals at registers in one clock domain or between domains - variance from cycle to cycle at the clock generating source FF1 FF1 FF2 FF2 CLK1 CLK2 CLK1 Clock Jitter Interclock Jitter FF1/CP FF2/CP 87 2007年6月19日星期二 CLK1 Jitter CLK2 Jitter Cadence Confidential: Cadence Internal Use Only Understanding Clock Skew • The skew time specifies the maximum allowable delay between 2 signals, which if exceeded causes devices to behave unreliably – This timing check is often used in cells with multiple clocks • You can use a two-dimensional table or constant value to define a skew timing check clock1 clock2 Skew Skew (clock1 => clock2 posEdge posEdge (CONST(1.5))) 88 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 44 Understanding Clock Skew (Cont.) • The skew of a clock tree is the difference between the min and max insertion delays of the tree FF1 FF2 CLK CLK FF1/CP FF2/CP min insertion max insertion Skew = max insert - min insert 89 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only set_clock_uncertainty • Specifies uncertainty (skew) of clock networks – Simple uncertainty specifies setup/hold uncertainties to all paths to the endpoint. – Inter-clock uncertainty specifies skew between various clock domains. • set_clock_uncertainty [-from from_clock] [-to to_clock] [-rise] [-fall] [-setup] [-hold] uncertainty [ object_list] Set uncertainty to worst skew expected to the endpoint or between the clock domains. One increases the value to account for additional margin for setup/hold. set_clock_uncertainty -setup 0.65 [get_clocks CLK] set_clock_uncertainty -hold 0.45 [get_clocks CLK] Ideal CLK1 CLK1 Setup Uncertainty = 0.65 ns 90 2007年6月19日星期二 hold Uncertainty = .45 ns Cadence Confidential: Cadence Internal Use Only 45 Defining uncertainty & latency together! • Describe an ideal clock with a period of 10 – waveform of {0 5} – Rise & Fall clock latency of 1ns – hold uncertainty of 0.3, and setup uncertainty of 0.2 create_clock –name “CLK” –period 10 –waveform {0.0 5.0} set_clock_latency 0.8 -rise [get_clocks CLK] set_clock_latency 0.8 -fall [get_clocks CLK] set_clock_uncertainty -setup 0.2 [get_clocks CLK] set_clock_uncertainty -hold 0.3 [get_clocks CLK] Ideal CLK1 Latency = 1 ns CLK1 Setup Uncertainty = 0.2 ns 91 2007年6月19日星期二 hold Uncertainty = 0.3 ns Cadence Confidential: Cadence Internal Use Only set_input_delay – 1/2 • • Defines the arrival time relative to a clock. For bidirectional ports, one can specify the path delays for both input and output modes Use –add_delay to capture multi-clock delay relations • set_input_delay [-clock clock_name] [-clock _fall] [-level_sensitive] [-rise] [-fall] [-max] [min] [-add_delay] [-network_latency _included] [-source_latency _included] delay_value port_pin_list set_input_delay 4.3 -rise -clock CK2 {IN2} set_input_delay 3.5 -fall -clock CK2 {IN2} • CK1 Command assumes that a rising signal on IN2 can occur 4.3 time units after the rising edge of clock 8ns period CK2 and a falling signal has a delay of 3.5 units to reach IN2 4.3 ns IN2 3.5 ns IN2 CK2 IN1 CK1 IN2 CK2 CK2 4ns set_input_delay 2.7 -clock CK1 -add_delay { IN1 } set_input_delay 4.2 -clock CK2 -add_delay { IN1 } • 92 Command specifies input delay for IN1 of 2.7ns relative to clock CK1 and 4.2ns relative to CK2 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 46 set_input_delay – 2/2 17 ns clk clk d1 DQ comb. logic d1 comb. logic DQ comb. logic clk 17 ns "other" module "current design" module set_input_delay 17 -rise -clock clk { d1 } • Command specifies input delay of 17ns relative to clock clk • “Environmental behavior” or “other module” timing is important for proper constraining for synthesis 93 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only set_output_delay • • • The command sets output path delay values for the current design. The input and output delays characterize the operating environment of the current design Output ports have no output delay, unless specified. set_output_delay [-clock clock_name] [clock_fall] [-level_sensitive] [-rise] [-fall] [-max] [-min] [-add_delay] [-network_latency_included] [-source_latency_included] delay_value port_pin_list 5 ns clk clk d1 DQ comb. logic d2 comb. logic DQ comb. logic clk 5 ns “current" module “other" module set_input_delay 5 -rise -clock clk { d1 } • Command specifies expected output delay of 5 ns relative to clock clk • “Environmental behavior” or “other module” timing is important for proper output constraints 94 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 47 Understanding Virtual Clock • Virtual clocks are clocks that exist in memory but are not part of a design – Use it as a reference for specifying input and output delays relative to a clock • This means there is no actual clock source in the design – I.e. Assume the block to be synthesized is “blockB” • The clock signal, “vclk”, would be a virtual clock • The input delay and output delay would be relative to the virtual clock blockB in1 out1 vclk 95 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Environmental Attributes set_drive set_load set_driving_cell set_wire_load sets the wireload model for the current design set_fanout_load* set_port_fanout_number* * In the advanced STA/SDC chapter 96 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 48 Environmental Constraints • set_drive : superceded by set_driving_cell • set_driving_cell : list a cell from your library that represents a typical cell that would be driving your input ports set_driving_cell –cell “INV” –library “WCCOM” –pin “Y” • set_load: a capacitive load that represents the amount of loading that is on the output ports of the design set_load 1.0 all_outputs • set_operating_conditions: nowadays, the tech library has been created for a specific set of operating conditions and thus specifying the operating conditions might not be necessary. 97 set_operating_conditions “WCCOM” –library “slowtech” Cadence Confidential: Cadence Internal Use Only 2007年6月19日星期二 Wireload Models – A necessary Evil • • • Wireloads provide estimate capacitive and resistive load of nets calculated from fanouts It gives synthesis tools an estimate of expected loads from wires True physical knowledge synthesis would not need wireloads /***************************************************/ * slope : Used for estimating the equivalent length * of the output pin which drive more than 8 number * of fanout. Extrapolation is adapted. ****************************************************/ wire_load (ti_gs50) { resistance : 0.0; 14 capacitance : 0.0388; 12 area : 0.000; 10 slope : 1.698; 8 fanout_length (1.000, 1.655); 6 fanout_length (4.000, 4.925); fanout_length (5.000, 8.120); 4 fanout_length (8.000, 13.214); 2 } Large Block Size 100k Avg Block Size 25k Est Capacitance (nF) Small Block Size 2k 0 5 10 Fanout (N endpoints) 98 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 49 set_wireload_mode & set_wire_load_model • Wireload Mode specifies how synthesis will apply wlm’s for different hierarchicies set_wire_load_mode top set_wire_load_mode enclosed set_wire_load_mode segmented TOP U1 • • Wireload_model specifies the wlm names WLM Names are usually in terms of block size – U2 n3 n2 Avg wlm g30_medium, tsmc18_100k, tosh_35x35 n5 n4 U3 n6 n7 Small wlm Avg wlm Large wlm set_wire_load_model –name Large_wlm set_wire_load_model –name Avg_wlm {U1 U2} set_wire_load_model –name Small_wlm {U3} 99 Wireload setting Wireload Models Applies to the following nets top Large All Nets enclosed Large Avg Small n4, U1/n3, U2/n5, U3/n6 U1/n2 U2/U3/n6 segmented Large Avg Small N4 U1/n2, U1/n3, U2/n5 U2/U3/n6, U1/n6, U2/U3/n7 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only sample SDC file create_clock -period 7 -waveform {0 3.5} [get_ports {clk_fpci66m}] create_clock -period 14 -waveform {3.1 10.1} [get_ports {clk_ref25m}] set_clock_latency 4 [get_clocks {clk_ref25m}] set_clock_latency -source 1 [get_clocks GCLK1] set_input_delay 2.5 -clock "clk_ref25m" [get_ports {ipmitxbfr_rdata[0]}] set_output_delay 3 -clock "clk_pci" [get_ports {decalfipmi_fllen_z}] set_driving_cell -lib_cell BUFX8_TAX0 -library xlite_core [get_ports {ipmitxbfr_rdata[3]}] set_wire_load_model -name "pci_block_wl" -library "xlite_core“ set_load -pin_load 0.2 [get_ports {decalf_hdrmem_ld}] set_operating_conditions “slow” –library “techlib” 100 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 50 I NV E NT IV E CONFIDENTIAL Advanced Constraints & STA Version 1.0 Understanding Single Cycle Paths • By default, static timing tools assume all timing paths to be single cycle paths (assuming there is at least a clock defined) Hold Setup Launch Launch 0 0 5 15 20 10 15 Setup | Hold Check 20 10 • There could be exceptions defined to the above behavior: – Multicycle paths – False paths 102 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 51 Understanding Multi-Cycle Paths • Those paths that require more than one clock period for execution • It’s essential that multi-cycle paths in the design be identified both for synthesis and STA • Synthesis tool allows more than one cycle for the specified paths; more optimistic • Path through a multiplier takes longer than one clock cycle; designer needs to specify to the synthesis tool • Late example: Setup launch clk1 0 5 10 15 20 10 15 20 Setup check clk1 103 2007年6月19日星期二 0 Cadence Confidential: Cadence Internal Use Only Understanding Multi-Cycle Paths (Cont.) • Early Example hold launch clk1 0 clk1 104 2007年6月19日星期二 0 10 15 20 10 hold check 15 20 5 Cadence Confidential: Cadence Internal Use Only 52 set_multicycle_path – 1/3 • The command specifies that designated timing paths in the current design have no default single cycle setup or hold relations, but over multiple clock cycles Default hold check is at 0th cycle of the current clock edge – which is at the present edge • set_multicycle_path [-setup] [-hold] [-rise] [-fall] [-start] [-end] [-from from_list] [-to to_list] [-through through_list] path_multiplier set_multicycle_path –setup 2 -from { ff1b } -to { ff2d} • • The exception sets all paths between ff1b and ff2d to 2 cycle paths for setup. Hold is measured at the previous edge of the clock at ff2d. Clk (ff1b) Setup check Hold check Clk (ff2d) Setup/Hold check wihout MCP set_multicycle_path –hold 1 -from { ff1b } -to { ff2d} • The check moves the hold check to the preceding edge of the start clock. Clk (ff1b) Setup check Hold check Clk (ff2d) 105 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only set_multicycle_path – 2/3 • Multi frequency example shows a 20ns clock to 10ns clock with ff1/CP to ff2/D multicycle path of 2. create_clock -period 20 -waveform {0 20} clk20 create_clock -period 10 -waveform {0 10} clk10 set_multicycle_path 2 -setup -from ff1/CP -to ff2/D Clk20 (ff1) Hold check Setup check Clk10 (ff2) Clk20 (ff1) Setup check Hold check Clk10 (ff2) Setup/Hold check with MCP Setup/Hold check wihout MCP set_multicycle_path 2 -setup -from ff1/CP -to ff2/D set_multicycle_path 1 -hold -from ff1/CP -to ff2/D Clk (ff1) Setup check Hold check Clk (ff2) Setup/Hold check with MCP 106 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 53 set_multicycle_path – 3/3 INPUT1 ffa ffb ffc CK1 OUTPUT1 ffd Multi-cycle path ffe set_multicycle_path 4 -from { ffd } -to { ffe} x ff1 ff2 FSM set_multicycle_path 7 -from { ff1 } -to { ff2} • • 107 Above FSM enables the 1st and 2nd mux only 7 counts after each other so that multiplier operation delay can upto 7 cycles Without a multicycle path exception, too much synthesis resources may be spent on optimizing the multiplier for a single cycle 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Understanding Multiple Clocks • If more than one clock is used to time a design, you can define them to have different waveforms and frequencies • If clocks have different frequencies there must be a base period over which all waveforms repeat – Base period is the least common multiple (LCM) of all clock periods 0 CLK1 L CLK2 L CLK2 L 1 T T 2 T 3 4 L T L 5 T T 6 T L 8 L L 9 T T 10 T 12 L L L CLK1 CLK2 CLK3 Base Period - Least Common Multiple 108 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 54 Understanding False Paths • A path that can never be sensitized in the actual circuit – These paths are those that are logically/functionally impossible MUX1 INP1 LOOONG PATH INP2 A OUT B Sel<--1 • The designer should specify to the synthesis tool that the LOOONG path(comb or reg to reg) is false • The goal in static timing analysis is to do timing analysis on all “true” timing paths 109 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Example of False Path • There are 4 timing paths through Mux 1 and Mux 2 – – – – Path 1: A – C – C1 – C2 – Out Path 2: A - C – Out (through In1 of Mux 2) Path 3: B - B1 – B2 – C – C1 – C2 – Out Path 4: B – B1 – B2 – C – Out (through In1 of Mux 2) Mux 1 A B C In0 B1 B2 C1 C2 Mux 2 In0 In1 Sel In1 Sel Out Sel 110 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 55 Example of False Path (Cont.) • The select signal, Sel, drives both the Muxes • This design can have only two possible active timing paths at any given time – When Sel = 0, path 1 is active – When Sel = 1, path 4 is active – Thus, path 2 & 3 are false paths Mux 1 A B C In0 B1 B2 C1 C2 Mux 2 False Path (path 3) In0 In1 Sel In1 Sel Out False Path (path 2) Sel 111 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only set_false_path • • • • Command marks startpoint/endpoint pairs as false timing paths Essentially disables max delay (setup) and min delay (hold) checks False path takes precedence over multicycle path Specific set_max_delay or set_min_delay command overrides a general set_false_path command To disable the timing at a particular cell along a path, use set_disable_timing. • set_false_path [-setup] [-hold] [-rise] [-fall] [-from from_list] [-to to_list] [-through through_list] D set_false_path -from U14/Z -to ff29/RST • Setup/Hold checks from and gate output U14/Z to ff29/RST is disabled. Every other timing paths are honored set_false_path -from ff1/Q -through {U1/Z U2/Z} -through {U3/Z U4/C} -to ff2/D • • 112 Command disables all timing paths from ff1/Q to ff2/D which passes through one or more of {U1/Z U2/Z} and one or more of {U3/Z U4/C}. Multiple Paths are Affected! Specify General False Path carefully! 2007年6月19日星期二 U14 ff29 Z False_path U1 RST U1 U2 U1 U2 U1 Q U1 U2 U3 U2 U3 U2 ff1 U3 U4 U4 U5 U4 U4 U5 U6 U7 U3 U4 U4 U4 U4 U5 U6 U3 U4 U4 D ff2 ff3 Cadence Confidential: Cadence Internal Use Only 56 Understanding Asynchronous Clocks • In multiple clock domains, an asynchronous clock occurs when there is no common base period • BG unrolls the clocks till a L -> L phases match between clocks – This in effect is that there no asynchronous clocks in BG/PKS – In BG the only asynchronous clock is the @ clock – False path between clock domains • If two clocks are defined in a design, where period of one clock is 10, and the other 10.1 – Tool determines the common base period by expanding till a common integer base period is found, which is 1010 in this case. 113 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Understanding Divide By Clocks • A divide-by clock in a design D CLK-IN Q CP CLK-OUT QN CLK-IN QN / D CLK-OUT These waveforms are assuming there is no CP -> Q delay 114 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 57 Understanding Generated Clocks • This is a capability to model clock dividers / multipliers to create a new clock from a clock source – The above clock can be generated by the following command: • set_generated_clock -from FF1/CP -divide_by 2 -name CLK-OUT FF1/Q D Q CLK-OUT FF1 CLK-IN 115 2007年6月19日星期二 CP QN Cadence Confidential: Cadence Internal Use Only create_generated_clock • • • • Creates a generated clock object specify a pin or an output port whenever the master clock changes, the generated clock changes -divide_by or -multiply_by or edge derived clock (-edges) create_generated_clock [-name clock_name] -source master_pin [-edges edge_list] [-divide_by factor] [-multiply_by factor] [-duty_cycle percent] [-invert] [-edge_shift shift_list] [-add] [-master_clock clock] source_objects create_generated_clock -divide_by 3 -source CLK [get_pins CLK_by_3] CLK CLK_DIV_3 CLK_by_3 CLK CLK_by_3 create_generated_clock -multiply_by 2 -duty_cycle 70 -source CLK [get_pins foo1] CLK foo1 70% Duty 116 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 58 set_clock_transition • • • • • command overrides the transition times on reg clk useful for pre-layout when clk trees are incomplete Use only with ideal clocks. For propagated clocks the calculated transition times are used. If a clock transition is not specified for an ideal clock, the transition time is calculated as it is for other pins in the design. set_clock_transition [-rise] [-fall] [-min] [-max] transition clock_list set_clock_transition 0.38 -rise [get_clocks CLK1] set_clock_transition 0.25 -fall [get_clocks CLK1] Ideal CLK1 CLK1 0.38 ns 117 2007年6月19日星期二 .25 ns Cadence Confidential: Cadence Internal Use Only set_case_analysis • • • • Specifies that a port or pin is at a constant logic value 1 or 0 set_case_analysis A way to specify a mode of the design without altering the netlist. value For constant case_analysis, it is propagated through the network. port_or_pin_list In the event of case analysis on transition, the given pin or port is only considered for timing analysis with the specified transition. The other transition is disabled. The case analysis information is used by all analysis commands, • set_case_analysis 0 TEST_PORT • Ignores all timing paths when TEST_PORT is 1 TEST_PORT set_case_analysis rising {U1/U2/A U1/U3/CI} • 118 The above case specifies that the pins U1/U2/A and U1/U3/CI are only considered for a rising transition. The falling transition on these pins are disabled. 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 59 set_max_delay • • Specifies maximum delay for paths Within a given point-to-point exception command, the more specific command overrides the more general. Precedence list (from specific to general) of what gets overridden and honored • set_max_delay [-rise] [-fall] [-from from_list] [-to to_list] [-through through_list] delay_value 1. set_max_delay -from pin -to pin 2. set_max_delay -from clock -to pin 3. set_max_delay -from pin -to clock 4. set_max_delay -from pin 5. set_max_delay -to pin 6. set_max_delay -from clock -to clock 7. set_max_delay -from clock 8. set_max_delay -to clock ffc ffa ffd ffb Max_delay < 15ns ffe set_max_delay 15.0 -from {ffa ffb} -to {ffe} • Specifies all paths between ffa & ffb to ffe should be less than 15nsCK1 set_max_delay 8.5 -to [get_clocks CK2] • Specifies all paths to endpoints clocked by CK2 to be less than 8.5ns CK2 Regions delay < 8.5ns 119 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only set_fanout_load, set_port_fanout_number • set_port_fanout_number allows the user to specify how many “drops” there are on an output port. These drops represent fanout of the signal. It is used in conjunction with the fanout_load set_port_fanout_number 5 all_outputs • set_fanout_load allows the user to specify a wire_load_model that is different than the one for the current design – that represents the wire_load_model of the signals external to the design being constrained. – set_fanout_load “100x100” –library “slow” [get_ports A*] 120 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 60 sample SDC file create_clock -period 7 -waveform {0 3.5} [get_ports {clk_fpci66m}] create_clock -period 14 -waveform {3.1 10.1} [get_ports {clk_ref25m}] create_generated_clock -name GCLK1 [get_pins {pci_if_top1/pci_datapath1/master_addr_cnt1/pci64_xfer_reg}] \ -source [get_ports {clk_ref25m}] -edges {1 2 3} set_clock_transition 4 [get_clocks clk_pci] set_multicycle_path -from [get_cells subblk/data*] -to [get_cells subblk/c_reg*] -hold 1 set_multicycle_path -through [get_nets pci_am_block/mul*] -setup 2 set_false_path -from [get_cells pci_am_block/r_gpc_reg] -to [get_cells pci_am_block/gpx_reg] set_clock_latency 4 [get_clocks {clk_ref25m}] set_clock_latency -source 1 [get_clocks GCLK1] set_case_analysis 0 [get_ports TE1] set_input_delay 2.5 -clock "clk_ref25m" [get_ports {ipmitxbfr_rdata[0]}] set_output_delay 3 -clock "clk_pci" [get_ports {decalfipmi_fllen_z}] set_max_fanout 2 [get_ports {ipmitxbfr_rdata[4]}] set_driving_cell -lib_cell BUFX8_TAX0 -library xlite_core [get_ports {ipmitxbfr_rdata[3]}] set_wire_load_model -name "pci_block_wl" -library "xlite_core“ set_load -pin_load 0.2 [get_ports {decalf_hdrmem_ld}] 121 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Low Power Technology in Synthesis 122 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 61 Low-Power Design Flow Power Reduction Percentage System Design Algorithms, IP, etc… Low-power decisions Architecture Design Implementation Multi-Voltage islands, sleep mode … RTL Synthesis Clock gating, µArch, multi-Vth Place and Route Design structure committed Clock tree, gate-level Production 123 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Basic Terminology in Low-Power Design – Power Domain A group of logic that shares the same power net – Voltage Domain All logic with equal power supply voltage values in the domain. – Power Gating A feature to switch module power on and off – MSV: Multiple Supply Voltage • Can be MSSV: Multiple Supply Single Voltage where all blocks have the same voltage, but the supplies are isolated or • MSMV: Multiple Supply Multiple Voltage where different blocks are at different voltage levels 124 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 62 Basic Terminology in Low-Power Design (continued) – Isolation Cell A cell that isolates a signal that crosses an active on domain and an inactive off domain. – Level Shifter A cell that is placed between a source and receiver powered by different voltages. – Retention Cell A special D flip-flop that has the circuitry to save a state when power is turned off so that the state can recover when the power is restored. 125 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Basic Versus Advanced Low-Power Techniques Advanced Basic • x 126 Power reduction technique Leakage Dynamic Timing Area power power penalty penalty Implement. impact Design impact Verification impact Area optimization 1.1X 10% 0% -10% None None None Multi-Vth optimization 6X 0% 0% 2 to -2% Low None None Clock gating 0X 20% 0% <2% Low Low None Multi-supply voltage (MSV) 2X 40-50% 0% <10% Medium Medium Low Power shut-off (PSO) 10-50X ~0% 4-8% 5-15% Medium-high High High Dynamic and Adaptive Voltage Frequency Scaling (DVFS and AVS) 2-3X 40-70% 0% <10% High High High Substrate Biasing 10X - 10% <10% High MediumHigh Medium 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 63 Multiple Supply Voltage (MSV) Design External power shutdown (1.2V) Memory Turn-off control pin (1.2V) 1.0V Power Shielding clamps 1.0 V Power Domain 1.2 V Power Domain Voltage Level Shifter Memory Power Domain 3 (0.8V) clamps Voltage Level Shifter 1.2V CLK Domain 1.2V Power 0.8V Power 1.2V Power The following implementation issues need to be considered when designing with MSV areas: Isolation cells State saving cells Design verification for timing and power On-chip or off-chip power-supply generation issues 127 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Modes of Operation • There are three major modes for a chip: – Operational mode (mission mode) – Idle mode – Power-down mode Based on the scheme that switches off parts of the chip, this mode might have leakage issues. • Another mode is testing Power Active Leakage Time Functional Idle PowerDown Each Operation can be subdivided into several submodes to reduce power. – Test mode is using tests to validate functional correctness. 128 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 64 Dynamic Power Terminology Dynamic/Active/Switching Power = Internal Power (“Internal switching power”) + Net Power (“Net switching power”) • Dynamic Power = Active Power = Switching Power Internal short circuit power (caused by crowbar current*) Internal capacitance power (caused by charging and discharging of internal capacitances) *Crowbar current is when both the P and N channel transistors are both partially on. 129 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Leakage Power Leakage power is the static power dissipation by the CMOS transistor when the circuit is in the standby mode or when the device is inactive. Vdd I4 I3 Gate Oxide Gate I2 I1 Source Drain Subthreshold Vout Vin I2 Gate p-substrate CL I4 GND Components of Leakage Power I1 - Diode reverse bias current I2 - Subthreshold current I3 - Gate induced drain leakage I4 - Gate oxide leakage Of all the above leakage components, subthreshold leakage is critical and very important, 130 2007年6月19日星期二 ll Ce es on ph le ak Cadence Confidential: Cadence Internal Use Only 65 Activity: Anatomy of a Multi-Vth Library • How do you change the Vth of a device? 131 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 66 Low-Power Design Topics Active and Leakage Power Reduction Active Power: Synthesis Leakage Power: Synthesis Active Power: Multiple Supply Voltage Synthesis Active Power: Multiple Supply Voltage Implementation Leakage Power: Implementation Signoff Considerations 133 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Summary • Total Power = Active Power + Leakage Power – Active Power • To address active power, you need to address these variables: • Frequency, load, activity dependent, and supply-dependent power – Leakage Power • Process, bias, supply-dependent power • Leakage power as a component of total power grows exponentially with shrinking process sizes. 134 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 67 Introduction to Dynamic Power • Active or dynamic power is dissipated by the device when the device is active or in operation. Active power is also called dynamic power or switching power. • • The fundamental equation for representing the dynamic power is: P = α CV²f • Where C = Overall capacitance that is to be charged and discharged. V = The supply voltage of the device f = Frequency α = Switching activity or transitions every clock cycle 135 2007年6月19日星期二 By changing any of these factors, you will have greater control over dynamic power. Cadence Confidential: Cadence Internal Use Only Activity: Calculating Dynamic Power •Consider a 0.25 micron chip, 500 MHz clock, average load cap of 15 fF/gate (fanout of 4), 2.5V supply. Assume α =1. •Use the fundamental dynamic power equation. 1. What is the dynamic power consumption per gate? _________________________________ 2. With one million gates in the chip, assuming each transitions every clock, what is the dynamic power of the entire chip? _________________________________ 136 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power 68 Implications of Scaling Down the Vdd • Voltage scaling can result in significant power savings at the system and architecture levels, because dependence of power on supply voltage is quadratic. • Side effects of down-scaling Vdd: – Due to scaling nanometer technologies, continued scaling of supply voltage Vdd and the subsequent scaling of threshold voltage Vth will make subthreshold conduction a dominant component of power dissipation. – Lowering Vth along with Vdd down scaling leads to greater noise in the design. – Lowering Vdd can cause delays to increase and thus cause a reduction in performance. So an additional tradeoff between active power and performance is often required. 137 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power Methods to Reduce Active Power System Designused at • There are different techniques and strategies Algorithms, IP, etc. each stage of the design abstraction such as: – Architectural level – RTL and synthesis level – Physical implementation level Architecture RTL exploration Synthesis Physical Implementation Production 138 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 69 Reducing Active Power at the Architectural Level – Decisions made in the architectural domain have a significant impact on power consumed by the system. – There are no commercially available tools specifically for lowpower design exploration at the architectural level. The task is generally completed by designers based on their experience and by experimental implementation of specific blocks. – The low-power techniques, such as multi-supply multi-voltage, need to be accounted for at the architectural level before starting to write the RTL. – Designers need to model the low-power system by trading off power with performance. 139 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power Reducing Active Power at the RTL and Synthesis Level • Operand Isolation Technique When the outputs of the module test (en,a,b,c,out); input en; input [7:0] a,b,c; output [8:0] out; assign out = en? a+b : a+c; endmodule 140 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only functional units are not in use, you can gate the input to the combinational logic. This technique is also called sleep mode. The isolation logic is called sleep mode logic. Active Power 70 Reducing Active Power at the RTL and Synthesis Level (continued) • Clock Gating Technique module test (En, Data, clk, out); The clock can be shut off to the registers by using a gating circuit, which prevents the clock from triggering the registers. input En, clk; input [7:0] Data; output [7:0] out; reg [7:0] out; The clock gating can result in 30% to always@ (posedge clk) if (En) out <= Data; endmodule D 40% of the power savings compared to design without clock gating. data Q D Q data En En Clk 141 Clk 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power Clock Gating Challenges • Clock gating can cause: – Possible clock skew imbalance in the clock tree, resulting in challenging clock skew management – Glitches on the enable signal – Sensitive placement of gating elements – Extra insertion delay caused by gating logic – Verification issues, budgeting of timing constraints for the enable signal, if the gating is for a hierarchical design – DFT and timing issues due to complex structure – Issues between RTL clock gating and clock tree synthesis (CTS) • After CTS, many designers see setup problems at the enable pin of clock gate/latch. • The problem is usually because CTS moves the clock gate closer to the leaf cells to meet the clock constraints. 142 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power 71 Reducing Dynamic Power at the RTL and Synthesis Level • Other techniques for reducing dynamic power – Dynamic power optimization techniques include • • • • • • Gate sizing Pin swapping Removing buffers/inverters Gate merging Selection of datapath components Instance count reduction – Multiple supply voltage (MSV) synthesis 143 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power Reducing Dynamic Power during Physical Implementation • There are different techniques that you can use to optimize dynamic power during physical implementation: – – – – – – 144 Multiple supply multi-voltage implementation (MSMV) Dynamic voltage frequency scaling (DVFS) Power-aware placement based on vectors Low-power clock tree synthesis (LP-CTS) Dynamic power optimization. Power shut off (PSO) 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power 72 Physical Implementation Techniques to Reduce Clock Power Decloning and Cloning Logic (Low Power – Clock Tree Synthesis) Cloning Based on the placement of the registers and the gating logic, the clock gating instances with the same control signals are used to clone some gating logic to newly grouped registers. Reduces the overall capacitance of the clock tree. Decloning: Decloning is applied when the gating logic might not be closer to the register that it is controlling, but closer to other registers controlled by other gating logic. Other Advantages Clock-gate cloning and decloning help achieve timing closure and routability by cutting down the total wire length (capacitance). 145 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power Physical Implementation Techniques to Reduce Clock Power (continued) • Dynamic Voltage Frequency Scaling (DVFS) • • DVFS reduces the power in the chip (on the fly) by scaling down the voltage/frequency when peak performance is not required. Requirements for DVFS: – A variable power supply capable of generating the required voltage levels with minimal transition energy losses and a quick voltage transient response – When scaling the voltage, we must scale the frequency in the same proportion to meet signal propagation delay requirements – A power scheduler that can intelligently compute the appropriate frequency and voltage levels needed to execute the various applications (tasks or jobs) Drawbacks: Very complicated to implement due to several component considerations such as appropriate V/f values Very expensive to implement Clock scheduling issues due to dynamic latency changes 146 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Active Power 73 Leakage Power Leakage power is the static power dissipation by the CMOS transistor when the circuit is in the standby mode or when the device is inactive. Vdd I4 I3 Gate Oxide Gate I2 I1 Source Drain Subthreshold Vout Vin I2 Gate p-substrate CL I4 GND Components of Leakage Power I1 - Diode reverse bias current I2 - Subthreshold current I3 - Gate induced drain leakage I4 - Gate oxide leakage Of all the above leakage components, subthreshold leakage is critical and very important, 147 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Technology Scaling and Its Effect on Leakage • At 90 nm and below, leakage power management is essential in the chip design process. – As voltages scale downward with technology, threshold voltages must also decrease to gain the performance advantages of the new technology. – This reduction in threshold voltages has led to an exponential increase in subthreshold leakage current in transistors. – 148 Thinner gate oxides have led to an increase in gate leakage current. This aspect is also gaining in importance as the geometries shrink (below 65 nm). 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 74 How to Control Leakage Power? – – • Leakage Reduction Techniques – – – – 149 Leakage power, P leakage = f (Vdd, Vth, W/L) By controlling any or all of the 3 variables Vdd, Vth, W/L, you can control leakage. Gate-level Multi Vth implementation Body biasing (Vth) Device sizing ( W -> shorter, L -> Longer) Power supply gating with state retention (Vdd ) 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design For Test (DFT) 150 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 75 Module Objectives • In this module, you will understand: – Why we test – What we test for – How we test for it • 151 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Course Agenda Why? The Purpose of Test • Chip Design Flow • Product Quality • Process Enablement What? The Target of Test • • • • • • Manufacturing Defects Faults are Abstract Defects Static (Stuck-at) Faults Dynamic (Transition) Faults Other Fault Models Pattern Faults How? The Basics of Test • “Functional” Patterns • Combinational ATPG • Sequential ATPG • Scan ATPG Design Rules LSSD and Mux-Scan • Embedded Memories and Cores • Logic BIST • Test Compression • Additional Test Modes • Escapes • Diagnostics When? A Very Brief History 152 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 76 Why? The Chip Design Process •The chip design process is one of successively reducing the level of abstraction – breaking down the problem to simpler and simpler parts – until the simple parts can be put together to form successively more complex parts of the chip. At each step, there is some form of verification that the translation from one level of abstraction to the next has been performed correctly. “I want a graphics chip that shows me a realistic view of an arbitrary 3-dimensional world without any image flicker or jerkiness.” That’s pretty abstract. So how does it get turned into a chip? 153 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Why? The Chip Design Flow • Reducing the level of abstraction: – – – – – – – • Customer requirements Specification Architecture Hardware description language Gates and nets Transistors, wires, and vias Shapes Building the chip: – Masks – Etches, diffusions, and depositions – Dicing and packaging 154 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 77 Why? The Chip Design Flow • Reducing the level of abstraction: – – – – – – – • • – Assertion/transaction checking Customer requirements Specification Architecture Hardware description language Gates and nets Transistors, wires, and vias Shapes – Various types of simulation – Formal (Boolean) verification – Timing verification – Shapes checking – Optical inspection Building the chip: – Wafer tests – Module tests – Masks – Etches, diffusions, and depositions – Dicing and packaging 155 2007年6月19日星期二 Verifying the steps: Cadence Confidential: Cadence Internal Use Only Why? The Chip Design Flow • Reducing the level of abstraction: – – – – – – – • Customer requirements Specification Architecture Hardware description language Gates and nets Transistors, wires, and vias Shapes Verifying the steps: – Assertion/transaction checking – Various types of simulation – Formal (Boolean) verification – Timing verification – Shapes checking – Optical inspection Building the chip: – Masks – Etches, diffusions, and depositions – Dicing and packaging 156 • 2007年6月19日星期二 – Wafer tests – Module tests Cadence Confidential: Cadence Internal Use Only 78 Why? Reducing Levels of Abstraction X <= NOT Y; Manufacturing Test compares predicted behavior of the gate-level design to the actual behavior of the silicon. 157 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Why? Reducing Levels of Abstraction X <= NOT Y; Manufacturing Test compares predicted behavior of the gate-level design to the actual behavior of the silicon. Manufacturing Test proves that the chip was built as designed, not that the chip was designed correctly. 158 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 79 Why? Product Quality and Process Enablement • Manufacturing test accomplishes two major goals: – Reject Defective Modules (Product Quality) • Want to minimize “test escapes” • Want to maximize yield – Monitor and Improve Manufacturing Process • Identify when process variables move outside acceptable values • Support failure analysis by identifying probable defect location (Diagnostics) • Enable bring-up and rapid ramp of new process or line • One enabler of “Moore’s Law” 159 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Course Agenda Why? The Purpose of test • Chip Design Flow • Product Quality • Process Enablement What? The Target of Test • • • • • • Manufacturing Defects Faults are Abstract Defects Static (Stuck-at) Faults Dynamic (Transition) Faults Other Fault Models Pattern Faults How? The Basics of Test • “Functional” Patterns • Combinational ATPG • Sequential ATPG • Scan ATPG Design Rules LSSD and Mux-Scan • Embedded Memories and Cores • Logic BIST • Test Compression • Additional Test Modes • Escapes • Diagnostics When? A Very Brief History 160 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 80 What? Manufacturing Defects Defects are deviations from the desired result. They are presumed to cause incorrect behavior at some point. Examples: Photos compliments of IBM Corp. 161 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only What? Abstracting Defects If this defect (short) were between the ground (cyan) and the input (red) of this inverter, it would behave as if the input was always (stuck) at zero. What we observe is the behavior. Vdd In Out Gnd Photos compliments of IBM Corp. 162 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 81 What? Faults: Abstracted Defects • It is impossible to catalog, much less observe, all of the possible defects in a large chip. New ones are discovered regularly. • Instead, we attempt to catalog the behavioral change that will occur if a defect is present. Many defects introduce the same incorrect behavior. • A fault model is a set of abstract defects exhibiting a common type of behavioral changes: – Pin stuck-at fault model – Pin transition fault model – Other fault models, such as net bridging faults • Other tests measure analog specifications to detect defective chips: – Chip IDDQ tests measure chip supply current under different stimuli – I/O Parametric tests measure driver and receiver voltage and current 163 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only What? Stuck-at Fault Model • Over a particular set of stimuli, a pin of a gate behaves as if it were stuck at a fixed value (1, 0, X). – This mimics typical shorts and opens, among other things. – Most common model, also referred to as “Static Faults” Vdd In Out Gnd 164 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 82 What? Stuck-at Fault Model • Over a particular set of stimuli, a pin of a gate behaves as if it were stuck at a fixed value (1, 0, X). – This mimics typical shorts and opens, among other things. – Most common model, also referred to as “Static Faults” Vdd Example: excess metal on gate level creates a short to the ground-to-drain via on lower transistor. The input is shorted to ground and behaves as a stuck-at-0. In This may also create an IDDQ defect if attempting to drive the input to 1 drains too much current. 165 2007年6月19日星期二 Out Gnd Cadence Confidential: Cadence Internal Use Only What? Transition Fault Model • Over a particular set of stimuli, plus a specific transition, a pin of a gate responds to the transition too slowly. – Increasingly important, referred to as “Dynamic Faults” – This mimics resistive shorts and opens, among other defects. – This is a superset of the stuck-at fault model. Vdd In Out Gnd 166 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 83 What? Transition Fault Model • Over a particular set of stimuli, plus a specific transition, a pin of a gate responds to the transition too slowly. – Increasingly important, referred to as “Dynamic Faults” – This mimics resistive shorts and opens, among other defects. – This is a superset of the stuck-at fault model. Vdd Example: via from input to gate is not properly filled with metal, creating a high resistance in series with the inverter input capacitance. Out In This causes the gate voltage to transition very slowly, resulting in both slow-to-rise and slow-to-fall transition behaviors on the input pin. Gnd 167 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only What? Example Transition Defect • Resistive Open • Metal 3 layer • 200 ps slow Photos compliments of IBM Corp. 168 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 84 What? Example Transition Defect • Si Contaminant • Resistive short • 125 ps slow Photos compliments of IBM Corp. 169 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Course Agenda Why? The Purpose of test • Chip Design Flow • Product Quality • Process Enablement What? The Target of Test • • • • • • Manufacturing Defects Faults are Abstract Defects Static (Stuck-at) Faults Dynamic (Transition) Faults Other Fault Models Pattern Faults How? • The Basics of Test – “Functional” patterns – Combinational ATPG – Sequential ATPG – Scan ATPG • Variations on the theme – Logic BIST – Embedded Memories and Cores – Test Compression – Additional Test Modes • Chip Manufacturing Test – Escapes – Diagnostics When? A Very Brief History 170 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 85 How? The Basics of Test. • Test is basically a problem of control and observation. 171 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? The Basics of Test. • Test is basically a problem of control and observation. • For each fault: – – – – 172 Control the circuit to establish a state sensitive to the fault Propagate fault effect to an observation point Predict an observable expected result Observe the actual result and compare to the expected 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 86 How? The Basics of Test. • Test is basically a problem of control and observation. • For each fault: – – – – Control the circuit to establish a state sensitive to the fault Propagate fault effect to an observation point Predict an observable expected result Observe the actual result and compare to the expected • The amount of computation can be overwhelming. Various computational simplifications (abstractions) are used, just as the fault model is used to represent the much larger number of possible defects. • The data volume of the test data can overwhelm the testers, so various techniques are used to reduce the data volume. • The time required to run the test can more than double the manufacturing costs, so various techniques are used to reduce the test time. 173 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Functional Patterns • Originally, it was assumed that the functional simulation vectors would prove that the chips were properly manufactured. • Advent of “fault grading” proved that coverage was poor, and got worse as the chips got larger: – – The number of I/O went up much more slowly than the number of gates, limiting both control and observation. Functional patterns that offer close to comprehensive testing are too large to be able to fault simulate or to run on the tester. (Think of trying to functionally verify that a 32 bit adder gives the correct answer for all 265 possible inputs.) • Functional patterns suffer from the difficulty of determining completeness that is common to all forms of functional verification. • Generation of functional patterns can easily require as much engineering resource as the design of the chip itself. • It is extremely difficult to diagnose failures. 174 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 87 How? Structural Testing • “Structural Testing” refers to testing the circuit gate-bygate and net-by-net to ensure that each gate works and that all the interconnections are intact and correct. • It is not dependent on knowledge of the function that the structure was created to perform. • Tests can be generated automatically. All forms of Automatic Test Pattern Generation (ATPG) use structural test algorithms. • It is easy to measure coverage, and relatively easy to attain almost full coverage. • Note that it verifies that the chip was built as designed. It does not verify that the design performs the intended function. 175 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? The ATPG Loop Selected Fault Fault Model Mark Tested Mark Untestable Generate Single Fault Tests Compact Patterns Fault Simulate Patterns Test Vectors 176 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 88 How? Step 1: Generate Single Fault Test • To generate a test for a single fault, – Select a single fault (static, dynamic, or pattern), essentially at random. – Set primary input test values that: • Sensitize the fault by: – Forcing the cell pin with the fault to a value opposite the fault – Forcing other cell inputs to non-controlling states • Propagate the cell output along a path to a primary output – Generate a transition at the appropriate primary input if the fault is dynamic. • Faults that either cannot be sensitized or cannot be propagated are classified as untestable. 177 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Combinational ATPG (Illustrated) AND 178 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 89 How? Combinational ATPG (Illustrated) 1 0 AND 0 1 179 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Combinational ATPG (Illustrated) 1 1 0 1 0 1 AND 0 180 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 90 How? Combinational ATPG (Illustrated) 1 1 0 1 0 1 AND 0/1 1/0 0/1 0 181 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Combinational ATPG (Illustrated) 0 1 0 1 0 1 0 1 AND 0/1 1/0 0/1 0 1 Structural ATPG works best on purely combinational circuits. 182 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 91 • Define a test pattern for this fault: 0 1 183 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 92 How? Sequential ATPG • On typical chips, signals at logic gate pins will not be controlled or observed by chip I/O, but by flip-flops. The chips are not purely combinational. • When back-tracing, and a flip-flop is found to control the path, then the test generator must recursively generate another pattern to set the correct value into the flip-flop before it can test the fault. • When forward tracing, and a flip-flop is found to observe the results, then the test generator must recursively generate another pattern to capture the value and propagate it forward. • This must be done recursively through all levels of flip-flops until tester-controlled and observed primary I/O are reached. • This will result in multiple vectors being applied at the chip I/O, with clock events in between, to test a single fault. 185 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Sequential ATPG Create a Test for a Single Fault (Illustrated) FF D Q1 C 0 AND 0 1 186 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 93 How? Sequential ATPG Create a Test for a Single Fault (Illustrated) + FF 1 1D Q1 + C 0 0 AND 0 1 1 + 187 2007年6月19日星期二 Implies that pin is pulsed high after the pattern is applied Cadence Confidential: Cadence Internal Use Only How? Sequential ATPG Create a Test for a Single Fault (Illustrated) 0 + FF 1 1D Q1 0 0 1 0 1 1 + C 0 1 AND 0/1 1/0 0/1 0 1 + 188 2007年6月19日星期二 Implies that pin is pulsed high after the pattern is applied Cadence Confidential: Cadence Internal Use Only 94 How? Scan Test • Making all flip-flops scannable and connecting into scan chains connected between tester contacted primary I/O pins: – – – – Allows every flip-flop to be independently controlled and observed. Allows every flip-flop to act like a combinational logic input. Allows every flip-flop to act like a combinational logic output. Reduces the ATPG problem to a purely combinational test problem, which we know how to solve relatively easily. – Increases area of each flip-flop and wiring congestion; both taken into account in modern technologies. • Scan has become a universal form of test access for control and observation: ATPG and ATE, IEEE 1149.1 and other test standards, Logic BIST, and test compression all depend on scan. 189 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Scan Flip-Flops • To convert a traditional flip-flop to a scan flip-flop, we can simply add a multiplexer on the data input to the flop. – “Scan_Enable” signal selects normal functional data input, or a new scan data input to the flip-flop. – Scan inputs are chained to output of other flip-flops. – Same clocks are used for both scan and functional operation. Scan-Enable Data Clock Scan-In FF D Q Data Clock >C I D FF Q >C Scan-Out 190 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 95 How? Scan Test Connections • All of the flip-flops sharing a clock will also normally share a Scan_Enable. The output of each flop is daisy-chained to the scan-in of another flop, creating one or more chains through all flip-flops. Scan_Enable Scan-In I D PI 2007年6月19日星期二 FF Q >C Scan-Out Logic I D FF Q >C I D FF Q >C Clock 191 I D FF Q >C PO I D FF Q >C I D FF Q >C PO Cadence Confidential: Cadence Internal Use Only How? Test Stimulus “Scan Load” • The Scan_Enable test control signal is placed in the scan state, and the clock shifts stimulus data from the scan-in pin through each of the flops. Thus, the tester can control the state of all flip-flops. Scan_Enable Scan-In I D FF Q >C I D FF Q >C I D FF Q >C PI Clock 192 2007年6月19日星期二 I D FF Q >C Scan-Out PO I D FF Q >C I D FF Q >C PO Cadence Confidential: Cadence Internal Use Only 96 How? Test Application • The Scan_Enable is changed to the functional state, and the clock is pulsed, causing the flip-flops to capture the responses at their inputs. The tester stimulates all functional PI and observes all functional PO. Scan_Enable Scan-In I D PI 2007年6月19日星期二 FF Q >C Scan-Out Logic I D FF Q >C I D FF Q >C Clock 193 I D FF Q >C PO I D FF Q >C I D FF Q >C PO Cadence Confidential: Cadence Internal Use Only How? Test Response “Scan Unload” • The Scan_Enable test control signal is again placed in the scan state, and the clock shifts the captured data through each of the flops to the scan-out pin. Thus the tester can observe the state of all flip-flops. Scan_Enable Scan-In I D FF Q >C I D FF Q >C I D FF Q >C PI Clock 194 2007年6月19日星期二 I D FF Q >C Scan-Out PO I D FF Q >C I D FF Q >C PO Cadence Confidential: Cadence Internal Use Only 97 How? Scan ATPG • ATPG requires a control and clocking sequence to scan data through the chains (usually a pre-defined default). ATPG then generates patterns to test the scan chains. • Once the scan chains are tested, ATPG just treats each flip-flop as a combinational logic input and output. • Single fault tests are generated exactly like the combinational example except ATPG must compute control and clocking sequences to launch the data from the flip-flops and to capture the results into the flip-flops. • Stimulus data for the next test will be scanned in while results of the previous test are scanned out (overlapped scan). • Combinational test generation supports full-scan or some partial-scan circuits, though the higher the percentage of scan, the better. • Sequential test generation supplements combinational test generation for circuits with a larger percentage of non-scannable latches. 195 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Step 2: Compact Tests to Create Patterns • Single fault tests are merged, or “compacted,” into a single pattern if: – They do not conflict (different values at the same location). – They have the same clock (launch/capture) sequence. • When no more tests are merging into a pattern, the remaining unspecified pattern bits are filled (usually with pseudo-random data). • The pattern is then fault simulated against all faults (not just those targeted), and the detected faults marked as tested in the fault list. • On large chips, specified bit densities average about 1% to 2%. A few patterns will have 20% to 90% specified bit densities. (Small chips usually have higher average densities.) 196 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 98 How? Compact Tests to Create Patterns (Illustrated) I/O, scan chain 0 0 1 Logic Under Test 0 1 fault 1 I/O, scan chain 1 I/O, scan chain = unspecified fill 197 2007年6月19日星期二 = specified Cadence Confidential: Cadence Internal Use Only How? Compact Tests to Create Patterns (Illustrated) I/O, scan chain 0 0 1 Logic Under Test I/O, scan chain 1 1 1 0 0 = unspecified fill 198 2007年6月19日星期二 1 fault 1 0 fault 2 I/O, scan chain 0 fault 3 0 1 = specified Cadence Confidential: Cadence Internal Use Only 99 How? Compact Tests to Create Patterns (Illustrated) • A test pattern is mostly “don’t cares”, with random fill I/O, scan chain 0 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 0 0 0 1 Logic Under Test fault 1 I/O, scan chain 1 0 0 1 1 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 fault 2 fault 3 I/O, scan chain 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 1 1 = unspecified fill 199 2007年6月19日星期二 = specified Cadence Confidential: Cadence Internal Use Only How? Step 3: Fault Simulate Patterns 100.00% • Conclusion: The last few percent (or fraction of a percent) of coverage require the most patterns. 40.00% 20.00% 0.00% 40 00 – 60.00% 30 00 – 0.85 20 00 – The curve starts with the results of the scan chain tests, in this case 40%. It rises rapidly as the “easy to detect” faults are detected, most often by random data. There is a “knee” in the curve, after which coverage grows very slowly as the test generator works on the “hard to detect” faults. The curve becomes asymptotic as all testable faults are detected. 0.99 0.97 0.985 10 00 – 80.00% 0 • Most test coverage vs. pattern count curves look like this: Test Coverage 200 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 100 How? Scan ATPG – Design Rules • Rules that guarantee scan chain operation – normally no exceptions! – In scan mode, all clocks, including asynchronous set/clear signals, must be controlled from chip primary test inputs. • • • • 201 Gated and derived scan clock overrides. Asynchronous set/clear signal override. No free-running clocks. Scan_Enable test pin normally used to establish the required scan conditions. 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Scan ATPG – Design Rules • Rules that guarantee scan chain operation – normally no exceptions! – In scan mode, all clocks, including asynchronous set/clear signals, must be controlled from chip primary test inputs. • • • • Gated and derived scan clock overrides. Asynchronous set/clear signal override. No free-running clocks. Scan_Enable test pin normally used to establish the required scan conditions. – All scan chains must be connected from a test scan input (control), through assigned flip-flops, to a test scan output (observation). – Correct ordering of positive and negative edge clocked flip-flops, use of retiming latches as required. – Changing chip test control inputs may not change the state of any scan chain element. (Clocks are not test control inputs.) – Scan path delay must exceed maximum clock skew plus max hold time. 202 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 101 How? Scan ATPG – Design Rules (continued) • Rules that prevent correct test pattern generation: – No combinational feedback (e.g.. nand-nand latches) – No logic that cannot be modeled at the gate level (including analog) • Exception: IP that is provided with test patterns – No clock signals feeding flip-flop data ports; timings are different. – No multi-test-cycle paths: • Static combinational ATPG has no knowledge of timing • Dynamic tests may have functional cycle times, meaning no multicycle paths • Dynamic test generation with a chip Standard Delay File (SDF) can detect multi-cycle paths and set the captured value to ‘X’ 203 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Scan ATPG – Design Rules (continued) • Rules that affect coverage, data volume, test time and cost: – No three-state or one-hot logic, especially if mutually exclusive control is not designed in; no multiply driven nets • Must hold during scan unless technology can withstand orthogonal drive • Correlations can be specified that will prevent the pattern generator from creating an orthogonal drive after scan – No non-deterministic logic (X-state generators) such as dynamic, or self-clocked logic. – No non-scan storage elements other than memories, and observe all memory inputs, control all memory outputs – Scan chain lengths should be approximately equal 204 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 102 How? Scan ATPG – LSSD vs. Mux-Scan • One of the great “religious” wars of test: – – As with most such wars, there is less than meets the eye. The two styles can be mixed and scanned together. • All scannable flip-flops or latches are built from two latches, called master and slave, operating on separate clocks to shift data: – – Level Sensitive Scan Design (LSSD) Shift Register Latches (SRLs) use separate test clocks for master and slave. Flip-flops use opposite phases of a single clock for master and slave. • LSSD was developed before timing analysis tools. Scan operation is guaranteed by use of 2 separate, non-overlapping, test clocks. • Mux-Scan uses timing analysis tools and timing correction (especially for Hold time violations) to guarantee scan operation. • Other design rules are the same for both styles. 205 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Variations on the Theme: Built-In Self-Test (BIST) – Control may come from ATE. – ATE (or system service processor) may program control. 206 2007年6月19日星期二 Stimulus Generation Control • In general, BIST implies that test stimuli are generated and test responses are usually compressed to a pass/fail value by on-chip circuits. • Useful at all package levels (chip, board, system.) • Control of the test sequence is generally controlled on-chip as well. Product Functions Response Processing Cadence Confidential: Cadence Internal Use Only 103 How? Memory BIST • Memory BIST executes specific test algorithms stepping address and data values. Controller – Based on regularity of Memory structure. – Extensive literature, algorithms well understood. – Proven to give full defect coverage for typical RAMs. Addr Data In Addr Din R/W 2007年6月19日星期二 RAM Dout • Fail data may be used to point to failing row or column for repair by switching in extra row or column. • ROMs normally tested by checksum of contents. 207 Data Out Compare Cadence Confidential: Cadence Internal Use Only How? Logic BIST • Logic BIST uses a Pseudo-Random Pattern Generator (PRPG) to generate scan test stimuli, and a Multiple Input Signature Register (MISR) to compress results. Product Logic Scan Chain Scan Chain • Best chip test for board, system, and field. • Relatively hard to get good coverage due to lack of intelligence in pattern generation. Controller – Usually built from a Linear Feedback Shift Register (LFSR), other possibilities such as Cellular Automata. – Usually added to chip, could be built from existing scan chain elements. PRPG MISR – Test points may be inserted in logic. • No X-States may be captured. 208 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 104 How? Test Compression • Test compression typically uses one or more elements of logic BIST, but with ATE providing control and stimulus. 2007年6月19日星期二 Product Logic Scan Chain 209 Scan Scan Chain – Uses ATPG to generate, compact, and compress the patterns. – Uses on-chip circuits to decompress patterns to more scan channels than ATE scan chains. – Use of on-chip MISR eliminates response data from the tester, while retaining “accidental” fault detections. Decompression Compression or MISR Cadence Confidential: Cadence Internal Use Only How? Test Compression (continued) • All test data volume reduction (compression) depends on all those unspecified bits in the test pattern. – – – Compaction merges single fault tests, traditionally assuming all scan data will be scanned onto the chip “as-is”. The result is a pattern with the same number of bits as there are scan elements. Compression packs the specified bits together, eliminating unspecified bit positions, assuming a specific, linear, usually sequential, decompression circuit on the chip. The result is a pattern that has many fewer bits than there are scan elements. The two steps may be done simultaneously or sequentially. • All test time reduction depends on decreasing the length of scan chains by increasing the number of them, assuming constant test clock rate – More scan chains means shorter chain length means less time to scan. • Traditional Test Diagnostics is supported by incorporating a normal scan mode in the chip. 210 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 105 How? Embedded Reusable IP Cores • Embedded cores that cannot be tested as part of the logic (hard cores provided with test patterns, for instance) must be isolated from the rest of the logic for test. Isolation prevents corruption of both logic and core tests. Isolation permits application of the supplied patterns. Core I/Os may be brought to chip I/Os (I/O isolation). Core I/Os may be connected to scan flip-flops (shift register isolation.) Core Core 211 2007年6月19日星期二 Shift Register Isolation Core Core Registers I/O Isolation Registers – – – – Cadence Confidential: Cadence Internal Use Only How? Additional Tests • IDDQ Tests – Data scanned into chip, supply current measured. – Patterns sensitize stuck-at faults, do not propagate results. – With small dimension technologies, normal leakage current is higher and data must be manipulated post-measurement to identify outliers and bad chips. • I/O Wrap Tests – Boundary scan used to control drivers and observe receivers; no contact by tester needed. • I/O Parametric Tests – Each strong source connected to output pin has SA-Z, SA-1, and SA-0 faults; all receiver inputs have SA-1 and SA-0 faults. – Each driver and receiver have analog specifications (current at voltage, thresholds, etc.) – Patterns are repeated for each parameter to be measured, and for each pin. 212 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 106 How? Chip Manufacturing Test Some Real Testers… ADVANTEST T6500 SERIES TERADYNE J973EP SERIES AGILENT 93000 SERIES 213 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Chip Escapes vs. Fault Coverage Failed Untested Passed: Good Test Escape per-cent vs. Stuck-At Coverage 10 Escapes • The chips that pass test but are bad in the field are called test escapes. • Chart shows percentage of escapes vs. different fault coverages for a bipolar technology. • Other technologies will be different but similar. • Basic point is that escape percentage is lower than untested fault coverage. • BUT…. 214 2007年6月19日星期二 1 0.1 0.01 0.1 99.9 1 10 99 90 Cadence Confidential: Cadence Internal Use Only 107 How? Effect of Chip Escapes on Systems • Chip escapes result in bad boards and systems. • The effect builds rapidly with the number of chips in the system. • Using the data from the previous slide, we get this plot: % Chip Stuck-At Coverage 99.9 99.1 92 100 10 1 0.1 0.01 1 10 100 % Bad Systems vs. # of Chips 215 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Effect of Chip Escapes on Systems • For a complex system, with more than 10 chips and a target of <1% defective product due to bad chips: – Requires 99.3% to 99.99% stuck-at coverage per chip, depending on the number of chips % Chip Stuck-At Coverage 99.9 99.1 92 100 10 1 0.1 0.01 1 10 100 % Bad Systems vs. # of Chips 216 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 108 How? Effect of Chip Escapes on Systems • For simpler consumer systems with 1-3 chips and <5% defective product target: – 92%-95% stuck-at coverage may be acceptable for the chips % Chip Stuck-At Coverage 99.9 99.1 92 100 10 1 0.1 0.01 1 10 100 % Bad Systems vs. # of Chips 217 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only How? Diagnostics • “Given one or (usually) more test miscompares, where is the most likely defect on the chip?” – Each miscompare identifies which flip-flop captured an incorrect value. – Sophisticated tracing algorithms identify possible causing faults and fault simulation of those faults assigns probabilities to each of those faults. – Additional special patterns can be generated to differentiate between possible causing faults. – Mapping from logical to physical (X-Y) isolates probable defect sites. This provides targets for additional Failure Analysis, whether “slice and dice” or e-beam, PICA, or other type of failure analysis. – For active analysis such as e-beam or PICA, special patterns can be generated to continuously re-stimulate the fault. • This is critical to failure analysis, manufacturing process improvement, and product and process yield management. 218 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 109 How? Diagnostics (Illustrated) 0 1 0 0 1 0 1 1 1 0 1 0 1 1 0 219 2007年6月19日星期二 0 1 AND Cadence Confidential: Cadence Internal Use Only How? Diagnostics (Illustrated) 0 1 0 0 1 0 1 1 1 0 1 0 1 1 0 220 2007年6月19日星期二 0 0 1 AND 1 Cadence Confidential: Cadence Internal Use Only 110 How? Diagnostics (Illustrated) 0 1 0 0 1 0 1 1 1 0 1 0 1 1 0 1 0 0 1 AND 1 0 The list of faults that could cause the miscompares, given the pattern inputs, are called “diagnostic call-outs”. 221 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Course Agenda Why? The Purpose of test • Chip Design Flow • Product Quality • Process Enablement What? The Target of Test • • • • • • Manufacturing Defects Faults are Abstract Defects Static (Stuck-at) Faults Dynamic (Transition) Faults Other Fault Models Pattern Faults How? The Basics of Test • “Functional” Patterns • Combinational ATPG • Sequential ATPG • Scan ATPG Design Rules LSSD and Mux-Scan • Embedded Memories and Cores • Logic BIST • Test Compression • Additional Test Modes • Escapes • Diagnostics When? A Very Brief History 222 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 111 When? A Very Brief IBM Test History • • • • • • • • 223 1975 – LSSD enforced in IBM (engineers revolt.) 1978 – Embedded Memories/Cores allowed, tested by I/O correspondence (engineers grumble.) 1980 – Test has capacity for 32K gates (engineers wish they had a 32K gate chip.) 1985 – Memory BIST and Logic BIST tools introduced (engineers happier – they’d been doing it manually.) 1988 – Test Patterns generated for 2 million gate MCM on 32bit mainframe (MCM engineers greatly relieved!) 1993 – EDA moves to workstations (engineers trying to order workstations.) 2000 – Test compression concepts introduced at International Test Conference (manufacturing engineers relieved.) 2002 – 7 million gate capacity on 32-bit workstations, 70 million gate capacity on 64-bit workstations. (Actually tested! Engineers still checking to see if runs completed - just kidding.) 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Formal Verification Using Encounter Conformal 224 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 112 Expanding Conformal Family Verifies 100% of design functionality without requiring test vectors Orders of magnitude faster than simulation Equivalence Checking RTL or Gate Verifies Low Power design implementation Performs logical and functional checks Low Power Verification 2007年6月19日星期二 v2 v1 A 225 RTL or Gate ISO B Extended Functional Finds bugs earlier in the design cycle Checks Verifies proper CDC synchronization to avoid clock related re-spins Creates safer EC environment Constraint Validation Formal validation of exceptions Uses industry proven formal engines Shorter design cycle with improved timing constraints Cadence Confidential: Cadence Internal Use Only Equivalence Checking • During development, a chip design undergoes numerous transformations and iterations prior to final layout – and each step in this process has the potential to introduce logical bugs X <= NOT Y; Built on Conformal Technology Equivalence Checking RTL Logic Synthesis Logic Optimization Test Insertion Custom EC Clock Synthesis Functional Checks Floor Planning Constraint Management Conformal Equivalence Checker Placement Routing Low Power Validation P&R Optimization ECOs 226 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 113 Verification of Complex Datapath • Trends indicate increased usage of advanced datapath optimization Datapath Synthesis & optimization RTL Gate • Extensive arithmetic expressions and advanced optimization technique pose verification challenges • Conformal Ultra provides formal verification solution for complex datapath Conformal Ultra • Operator merging, advanced pipelining • Supports wide variety of datapath A B C B A C architectures X + Merged Operator Y Y Operator Merging 227 2007年6月19日星期二 Advanced Pipeline Support Cadence Confidential: Cadence Internal Use Only Custom Equivalence Checking Custom Circuit Abstraction + Equivalence Checking Custom Pre-charged Sequential Logic Logic Logic Complex Boolean Built on Conformal Technology Footless Charge Latch and holder DFF AND-NOR Footed Diode Standard connected and Custom OR-NAND Dual Pullup andXNOR Pulldown XORphase and Q QB Equivalence Checking PreA Custom EC In0C In Y Y Y In1 Functional Checks D ClkB Constraint Management Low Power Validation D In0 InA In1B Clk C 228 2007年6月19日星期二 D Q Q YY Q Y QB DLAT Cadence Confidential: Cadence Internal Use Only 114 Custom Equivalence Checking Custom Circuit Abstraction + Equivalence Checking RTL Custom Circuit Built on Conformal Technology Equivalence Checking Custom EC Equivalence Checking Logic Abstraction Functional Checks Constraint Management • – – – – Low Power Validation • 229 2007年6月19日星期二 Verifies digital custom logic and memories standard libraries custom libraries embedded custom memories custom datapath Verifies full chip integration Cadence Confidential: Cadence Internal Use Only Extended Functional Checks Built on Conformal Technology Equivalence Checking • Custom EC • Functional Checks • Constraint Management Low Power Validation 230 2007年6月19日星期二 • Clock Domain Crossing, Semantic, and Structural checks Additional capabilities beyond EC to detect bugs earlier, during design creation Complements EC, creates safer EC environment Finds functional mismatches that are otherwise missed or detected only by gate level simulation late in design cycle Cadence Confidential: Cadence Internal Use Only 115 Constraint Management Constraint Design RTL Synthesis Built on Conformal Technology Built on Conformal Technology Full-Chip Prototyping / Floorplanning Equivalence Checking Equivalence Checking Block-level Implementation EC Custom Custom Circuit Abstraction Chip finishing + Signoff Functional Checks Functional Checks Constraint Management Constraint Management • Low Power Validation Design Exploration • • 231 2007年6月19日星期二 Constraint Refinements RTL Design Constraint Generation, Validation, and Analysis is a manual error prone process Limited EDA solutions to address Design Constraint problem Bad constraints = Bad Silicon Cadence Confidential: Cadence Internal Use Only Low Power Validation PwrEn1 PwrEn1 Power Controller PwrEn2 RET Built on Conformal Technology Built on Conformal Technology PwrEn2 v2 v1 A A ISO Y B RET Equivalence Checking Equivalence Checking EC Custom Custom Circuit Abstraction • – Equivalency checking – Structural analysis & formal analysis • RTL & gate level • Functional & structural checks Functional Checks Functional Checks Constraint Management – Logical & physical netlist Constraint Management Low Power Validation Design Exploration 232 2007年6月19日星期二 Built on Conformal technology – Level shifter & isolation cell checks for multiple power domains – Support power gating and State Retention Power Gating (SRPG) Cells – Physical verification on power domains Cadence Confidential: Cadence Internal Use Only 116 Conformal Product Packaging Conformal Low Power GXL Transistor Sneaky Paths Analysis Electrical Verification Conformal Low Power XL Verifies Low Power design (EC) Verifies Power domains Conformal Constraint Designer XL False Path Generation from RTL Conformal (Ultra) XL Conformal Constraint Designer L Verifies compiled datapath Verifies final LVS netlist SDC Validation and Analysis False path Validation Conformal (ASIC) L Verifies synthesized logic Verifies clock synchronization, synthesis assumptions, structural consistency Conformal Explorer Logical & physical linking Conformal (Custom) GXL Verifies custom logic, IO cells, custom memories, standard libraries 233 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Using Clock Domain Crossing (CDC) Formal Check Engine For complex clocks before Synthesis 234 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 117 Why Conformal CDC Checks? Clock Domain Analysis •Automatic clock tree identification Verify clock topology for: Proper clock tree definition and propagation Structural Checks Verify CDC synchronization for: Proper implementation of synchronizers to prevent metastability Proper implementation of synchronizers to prevent glitches Proper implementation of synchronizers to prevent multi-fanout •Metastability problems •CDC glitches •Multi-fanout issues Functional Checks •Graycode violations •Data stability violations 235 2007年6月19日星期二 Verify CDC data transfer for: Single bit change (gray) encoding checks for vectors Proper data stability across clock domain boundaries Cadence Confidential: Cadence Internal Use Only Problems with Asynchronous Crossings Signals crossing asynchronous domains create metastability. D DA FA DB CLK A FB DA CLK A CLKB samples DA while it is changing CLK B CLK B DB Potential metastability occurs due to setup or hold time violation in flip-flop “FB” Clocked signal DB is initially metastable …and might still be metastable at next rising edge of CLKB This metastable signal can create functional errors. How do you handle a metastable signal? 236 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 118 How to Handle Asynchronous Crossings • Using Synchronizers – Metastable signal needs time to settle to a stable value – Additional flop gives the metastable signal time to become stable CLK A D DA FA DB1 FB1 DB2 CLKB samples DA while it is changing DA FB2 CLK A CLK B DB1 CLK B Two flip-flop synchronizer solution DB2 is synchronized and valid DB2 But how do you verify proper synchronization? 237 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Typical Synchronization Scheme N N MUX Synchronizer for DATA PATH N N FLOP Synchronizer for CONTROL PATH CLK A CLK B Synchronizing the Data Path and the Control Path 238 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 119 Structural: Glitch Issues Glitch due to propagation delay Td CLK A DA1 logic DA1 A&B DB2 DB1 DA2 A&B Td CLK A CLK B CLK B DA2 DB1 DB2 CLK A Glitch on CLKA can cause false pulse on CLKB domain 239 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Structural: Multi-Fanout Issues DB1 DA1 CLK A DA1 DA2 CLK A tp DB2 DA2 CLK B DB1 DB2 CLK B Latched at different times Functional errors? DA1 DINT DB1 CLK A DA1 DINT CLK A tp DB2 CLK B DB1 CLK B 240 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only DB2 Latched at different times Functional errors? 120 Functional: Data Stability Issue • Data from the faster clock domain must be held long enough for the destination clock to latch it. Source Data Stability Destination Data Stability D DA FA DB FB CLK A 200 MHz CLK B 166 MHz DA changes twice on two CLKA rising edges CLK A DA CLKB samples DA only once CLK B DB Data transfer loss at destination Do you need stability check? 241 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Functional: Gray Code Issue Improper Gray code implementation can result in serious functional problems. Sync_wr_gray_ptr [7:0] Wr_gray_ptr [7:0] D CLK_A D Q Q D Q CLK_B This vector (control signal) should have one bit change at a time How do you ensure that the Gray code logic is implemented correctly? 242 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 121 Conformal (CDC) Flow Design Flow RTL Synthesis STA Golden GDSII Design Find synchronization errors earlier in the design flow Fix Conformal CDC Checks Place & Route Gate Design Compliments timing closure Synthesis tool takes asynchronous domains as false paths Looking at STA log files for missing synchronization can be quite cumbersome 243 • Legacy methodology flow does not have metastability closure. 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only FSM Check Process Flow VCD dump init seq. Setup Design Define Clocks and Clock Domain Rules Specify Clock and Data Associations Specify Sync Rule(s) Fix Design Diagnosis and Debug Select CDC Paths Verify Validate No Pass? Diagnosis Yes CDC Design Validated 244 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 122 Conformal® Constraint Designer (CCD) 245 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Design Constraints – Design implementation tools (such as logic synthesis, physical synthesis, design for test, and place-and-route) rely on applied design constraints (SDC). – Design constraints are typically generated, modified, propagated through the hierarchy, and analyzed manually. Unfortunately: • This process can be long, complicated, unpredictable, and unreliable. • Incorrect or poorly-designed constraints can lead to issues with silicon performance, as well as design re-spins. – This challenge drives the need for an automated solution that can generate, propagate, validate, and analyze design constraints. 246 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only 123 What is SDC? • Synopsys Design Constraints – Describe Design Intent • Synthesis, clocking, timing, power, test, environmental and operating conditions – Used for over ten years • Today’s “standard” for assertions on timing – Every tool dealing with timing and timing optimizations reads or generates SDC files • As ubiquitous as a verilog netlist format • Can also be generated by different intermediate tools – Primetime, RC, DC, FE, Magma BlastChip • Conforms to Tcl syntax 247 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Problems with Design Constraints Constraint Design RTL Design Clocks specified incorrectly; Boundary conditions not set consistently RTL Synthesis Full-Chip Prototyping / Floorplanning Block-level Implementation Chip finishing + Signoff 248 2007年6月19日星期二 Constraint Refinements Design source and SDC have mismatches Missing exceptions and Invalid exceptions Block designers write SDC independently from top level SDC, resulting in conflicts Pinpointing source of constraint issues is extremely difficult • Constraint creation, validation, and analysis are manual, error-prone processes. • Constraints cause iterations, longer design time. • Constraint issues increase risk of silicon failure or suboptimal performance. • No automated solution exists in marketplace. Cadence Confidential: Cadence Internal Use Only 124 Typical SDC Issues Missing clock definition Clock has no latency defined Clock mismatch between block level and top level create_clock -period 7 -waveform {0 3.5} [get_ports {clk_fpci66m}] create_clock -period 14 -waveform {3.1 10.1} [get_ports {clk_ref25m}] set_clock_latency …. Unconstrained Input or Output Inconsistent input and output delay values set_output_delay 3 -clock "clk_pci" [get_ports {decalfipmi_fllen_z}] set_input_delay 2.5 -clock "clk_ref25m" [get_ports {ipmitxbfr_rdata[0}] set_case_analysis mismatch between block and top level set_false_path -from [get_cells b1/*] -to [get_cells b2/*] set_false_path -through [get_pins {decalf_top/decalf_ee_active}] False path on a broken net True path set as false (esp. with wild cards) Missing False paths Overlapping exceptions set_load -pin_load 0.2 [get_ports {decalf_eeprm_data[15]}] set_driving_cell -lib_cell BUFX8 -library xl_c [get_ports {strback}] set_max_fanout 2 [get_ports {decalf_pmtch_strback}] set_max_transition 2 [current_design] 249 2007年6月19日星期二 Undefined input transition set_driving_cell set on clock ports Cadence Confidential: Cadence Internal Use Only Introduction to Conformal Constraint Designer • Conformal® Constraint Designer (Constraint Designer) provides a solution that can manage constraints for complex system-on-a-chip (SoC) designs, from RTL to layout. Design Source SDC Critical Paths Conformal Constraint Designer Formal Validation of False Paths Hierarchical Constraint Checks SDC Quality Checks SDC warning & error reports 250 2007年6月19日星期二 False Path Generation from Critical Paths Overlap and Conflict Check Analysis Reports Exceptions Cadence Confidential: Cadence Internal Use Only 125 Constraint Designer in Design Flow SDC Checks FP Validation Pre-Syn Blk.SDC Pre-Syn Blk.SDC Pre-Syn Blk.SDC FP Generation Synthesis Synthesis Synthesis SDC Checks FP Validation Critical FP Hier SDC Checks Post-Syn Blk.SDC Post-Syn Blk.SDC Post-Syn Blk.SDC Chip Level SDC Creation STA SDC Checks FP Validation Critical FP Hier SDC Checks SDC Checks FP Validation Critical FP 251 2007年6月19日星期二 Chip.SDC, Tim Reports Hier. P&R Budgeted Partition.SDC Budgeted Partition.SDC Budgeted Partition.SDC Chip Level Re-FP Cadence Confidential: Cadence Internal Use Only Comprehensive Analysis Environment Design Source Browser Main Window SDC Source Code Browser SDC rule violation indicator SDC Rules Manager Schematics 252 2007年6月19日星期二 Individual errors cross-linked Exception Manager Cadence Confidential: Cadence Internal Use Only 126 Why Constraint Designer? •Constraint Designer reduces the design cycle, improves QoS, and reduces silicon re-spins due to incorrect SDCs. – Creates, validates and analyzes constraints at every step of design process (RTL as well as gate level) • Only formal technology can help achieve this • You don’t need multitude of tools, each with different parsers, user interfaces – Validates generated constraints prior to generation • No need of validation as a separate step – Provides proven formal technology with proven frontend environment • Includes parsers, engines, analysis, user-interface; pinpoints root cause quickly 253 2007年6月19日星期二 Cadence Confidential: Cadence Internal Use Only Created by eDocPrinter PDF Pro!! 127