CSE241 VLSI Digital Circuits Winter 2003 Lecture 03: ASIC Flow and Design Convergence CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003 This Class + Logistics Overview of flow (preparation for Smith Chapters 12-17) Read: Smith Chapter 12 (Synthesis), 13.7 (Static timing) Lab #1 revised due date: Monday January 20 Near-term schedule: Ben has reserved the lab (EBU I, Room 3329) for this Friday, January 17, noon-1:20pm a running start into synthesis Recitation #2 tomorrow (noon-12:50pm): not on RTL design, but on datapaths and memories Lab tomorrow (3:30-5pm): really Lab #1 CSE241 L3 ASICs.2 Slide courtesy of S. P. Levitan, U. Pittsburg Kahng & Cichy, UCSD ©2003 Review Scaling of gates vs. Scaling of wires What happens when you make a gate bigger? What happens when you make a wire taller? Wider? Coupling Inductance How does power/ground distribution affect inductance? RC delay Dynamic (useful) power vs. Static (useless) power How do these issues impact estimates and design approaches? CSE241 L3 ASICs.3 Slide courtesy of S. P. Levitan, U. Pittsburg Kahng & Cichy, UCSD ©2003 Outline Design types and cost / complexity drivers Basic flow On convergence and hierarchy CSE241 L3 ASICs.4 Kahng & Cichy, UCSD ©2003 IC Design Methodologies Full-Custom (high effort, leading-edge performance, high-volume) Semi-Custom (strong infrastructure, economical in lower volumes) ASIC (Application-Specific Integrated Circuit) COT (Customer-Owned Tooling) ASIC vs. COT: “Who pays for the scrap?” FPGA System-on-a-Chip Larger components, often from outside of design team Special Analog (custom layout, I/Os and sense amps) Mixed-Signal / RF (unique to each process, no scaling) CSE241 L3 ASICs.5 Slide courtesy of S. P. Levitan, U. Pittsburg Kahng & Cichy, UCSD ©2003 Acceleration of Gate Length Scaling What are some implications? CSE241 L3 ASICs.6 •Slide courtesy of Numerical Technologies, Inc. Kahng & Cichy, UCSD ©2003 Mask NRE Cost (1999) CSE241 L3 ASICs.7 “$1M mask set” in 100nm, but average only 500 wafers per set Kahng & Cichy, UCSD ©2003 Design Technology Crises, ITRS-2001 Incremental Cost Per Transistor Test Turnaround Time NRE Cost Manufacturing SW Design Verification HW Design 2-3X more verification engineers than designers on microprocessor teams Software = 80% of system development cost (and Analog design hasn’t scaled) Design NRE > 10’s of $M manufacturing NRE $1M Design TAT = months or years manufacturing TAT = weeks Without DFT, test cost per transistor grows exponentially relative to mfg cost CSE241 L3 ASICs.8 Kahng & Cichy, UCSD ©2003 Silicon Complexity Challenges Silicon Complexity = impact of process scaling, new materials, new device/interconnect architectures Non-ideal scaling (leakage, power management, circuit/device innovation, current delivery) Coupled high-frequency devices and interconnects (signal integrity analysis and management) Manufacturing variability (library characterization, analog and digital circuit performance, error-tolerant design, layout reusability, static performance verification methodology/tools) Scaling of global interconnect performance (communication, synchronization) Decreased reliability (SEU, gate insulator tunneling and breakdown, joule heating and electromigration) Complexity of manufacturing handoff (reticle enhancement and mask writing/inspection flow, manufacturing NRE cost) CSE241 L3 ASICs.9 Kahng & Cichy, UCSD ©2003 System Complexity Challenges System Complexity = exponentially increasing transistor counts, with increased diversity (mixed-signal SOC, …) Reuse (hierarchical design support, heterogeneous SOC integration, reuse of verification/test/IP) Verification and test (specification capture, design for verifiability, verification reuse, system-level and software verification, AMS self-test, noise-delay fault tests, test reuse) Cost-driven design optimization (manufacturing cost modeling and analysis, quality metrics, die-package co-optimization, …) Embedded software design (platform-based system design methodologies, software verification/analysis, codesign w/HW) Reliable implementation platforms (predictable chip implementation onto multiple fabrics, higher-level handoff) Design process management (team size / geog distribution, data mgmt, collaborative design, process improvement) CSE241 L3 ASICs.10 Kahng & Cichy, UCSD ©2003 Outline Design types and cost / complexity drivers Basic flow On convergence and hierarchy CSE241 L3 ASICs.11 Kahng & Cichy, UCSD ©2003 Sylvester-Keutzer: Classic Picture CSE241 L3 ASICs.12 Sylvester-Keutzer, Computer Nov. 99 Kahng & Cichy, UCSD ©2003 Traditional Flow Behavioral Level Design IO Pad Placement Logic Design and Simulation Logic Synthesis Logic Partitioning Die Planning Front End Power/Ground Stripes, Rings Routing Global Placement Detail Placement Simulation Floorplanning Clock Tree Synthesis and Routing Design Verification Timing Verification Global Routing Test Generation Back End CSE241 L3 ASICs.13 LVS DRC ERC Extraction and Delay Calc. Timing Verification Detail Routing Kahng & Cichy, UCSD ©2003 Block-Level Design Methodology Design Specs Lib.+CWLM Lib.+CWLM Fnl. Design Synthesis Floor-plan & PG Placement Physical re-synth Clock distribution Route, scan re-order Timing analysis, IPO Fnl., pwr., SI ECO Reqmts. ERC, DRC, LVS Tape-out CSE241 L3 ASICs.14 A. Khan, Simplex/Altius Constraints • Architectural optimization (timing) • Inter-group buses, bandwidth • Clock, SI, test; validation • • • • Floorplanning and custom WLM Power distribution (Internal, I/O) I/O driver, padring design Board-level timing, SI • Row definitions • Placement of cells • Congestion analysis • Placement-based re-synthesis • Noise minimization, isolation • Clock distribution • Full routing • Scan stitching, re-ordering • Full RC back-annotation • Hierarchical timing, electrical and SI analysis and IPO/ECO Kahng & Cichy, UCSD ©2003 Generic Flow Steps Preparation Library data preparation Design data preparation •Physical floorplanning •Place and route •RC extraction •Formal verification •Physical verification •Release to manufacturing Logic design Specification to RTL RTL simulation Hierarchical floorplanning Synthesis Formal verification Gate level simulation Static timing analysis CSE241 L3 ASICs.15 Physical design Design for test Engineering change order Kahng & Cichy, UCSD ©2003 Library and Design Data Models and technology data required to execute the design flow Power, timing: ALF, DCL, OLA, .lib, STAMP Layout: LEF, DEF, GDSII Delays and path timing, parasitics: SDF, GCF, SDC, DSPF, RSPF, SPEF, SPICE Layout rules: CSE241 L3 ASICs.16 Dracula, Calibre “deck” Kahng & Cichy, UCSD ©2003 Specification to RTL Defines the logic and fundamental structure of the chip at the RTL level in either the verilog or VHDL language Requires considerable interaction with the customer, plus specs such as the architecture, system, design, test and block specs May include RTL from the customer or third party IP providers Coding guidelines should be established and adhered to, and the code must be compatible with the chosen synthesis tool Special design considerations such as multiple clock frequencies, asynchronous logic, high speed logic, race conditions, gated clocks, etc. must be addressed CSE241 L3 ASICs.17 Kahng & Cichy, UCSD ©2003 RTL Simulation RTL code, written in Verilog, VHDL or a combination of both, is simulated to verify functional correctness Testbenches apply input stimulus to the design Several methods are used to verify the outputs Self-checking testbenches automatically verify output correctness and report mismatches Results can be stored in a file and compared to previous results Waveform displays can be used to interactively verify the outputs Verification-specific tools: Verisity Specman, Synopsys Vera Functional verification Mostly Modelsim Cadence’s Verilog-XL or NC-Verilog also used CSE241 L3 ASICs.18 Kahng & Cichy, UCSD ©2003 Hierarchical Floorplanning Decide on the physical layout strategy—flat or hierarchical? Advantages of a hierarchical design Advantages of a flat implementation are generally a smaller die size, and a more straightforward approach to clock and power distribution and RC generation For hierarchical design, issues better runtimes, better ability to control timing within localized areas of the design, and concurrent design physical partitioning of the logic into blocks assignment of the physical locations for the block pins timing budgeting, distribution of clocks, power signal bus routing RC generation Tool Example: Cadence’s design planner CSE241 L3 ASICs.19 Kahng & Cichy, UCSD ©2003 Floorplanning Give placement initial clues Cells that are interconnected want to be close together Take advantage of RTL hierarchy Generate a physical hierarchy RTL hierarchy = best physical hierarchy? Place big blocks on chip (memories) Allow space for power/clk/busses Reduce complexity of placement CSE241 L3 ASICs.20 Kahng & Cichy, UCSD ©2003 Synthesis Conversion of RTL to gate level netlist Target foundry specific library Timing driven methodology clock information input arrival times, output required times Input driving cells, output loading False paths, multi-cycle paths Interconnect delay is calculated based on a wireload model which uses fanout to calculate delay Clocks parameters (insertion delay, skew, jitter, etc.) Are assumed to be attainable later in place and route CSE241 L3 ASICs.21 Kahng & Cichy, UCSD ©2003 Synthesis …contd. Hierarchical synthesis Block-by-block basis Minimizes runtimes Functional blocks Tools: Cadence Buildgates Synopsys Design Compiler (used for this course) CSE241 L3 ASICs.22 Kahng & Cichy, UCSD ©2003 Formal Verification RTL description and gate level netlist are compared to verify functional equivalence, thereby verifying the synthesis results An emerging technology that supplements the more traditional approach of gate level simulation Tools: Verplex Tuxedo-lec Design Verifier (Chrysalis), Mentor FormalPro Synopsys Formality (will be used in-class) CSE241 L3 ASICs.23 Kahng & Cichy, UCSD ©2003 Gate Level Simulation Another method to verify the synthesis process, which covers both the functionality and timing Correctness is only as good as the test vectors that are used Especially critical for non-synchronous designs, verification of false path and multi-cycle path constraints Cell timing is included in the simulation models and interconnect delay is passed from the synthesis run Worst case PVT conditions are used to analyze for setup violations, and best case PVT conditions are used to analyze for hold violations PVT = Process, Voltage, Temperature Popular tools are Cadence’s Verilog-XL or NC-Verilog CSE241 L3 ASICs.24 Kahng & Cichy, UCSD ©2003 Static Timing Analysis Verifies that design operates at desired frequency Implicitly assumes correct timing constraints (!), e.g., boundary conditions Timing constraints are similar to those used in synthesis As with gate-level simulation, both best- and worst-case analysis is performed Typically performed on full-chip (not block) basis Verifies setup and hold times at FF inputs; can also check timing from and to PI’s and PO’s; can also check point-topoint delay values (with blocking of pins, etc.) May require modified constraints for inter-block issues: multiple clock domains, multi-cycle paths, etc. For compatibility with timing-driven layout flow, helps to have simple / single set of constraints Other issues: incremental analysis, … CSE241 L3 ASICs.25 Kahng & Cichy, UCSD ©2003 Physical Floorplanning Defines the basic chip layout architecture Define the standard cell rows and I/O placement locations Place rams and other macro cells Define power bus structures such as power rings and stripes Often performed using the standard place and route tool Rules of thumb for cell density are used to initially calculate design size Popular standalone tools are Cadence’s design planner and avanti’s planet CSE241 L3 ASICs.26 Kahng & Cichy, UCSD ©2003 Place and Route Automatically place the standard cells Generate clock trees Add any remaining power bus connections Route clock lines Route signal interconnects Design rule checks on the routes and cell placements Timing driven tools Require timing constraints and analysis algorithms similar to those used during the static timing analysis step Tools: Cadence Silicon Ensemble, Synopsys Apollo, Magma Blast Fusion CSE241 L3 ASICs.27 Kahng & Cichy, UCSD ©2003 RC Extraction Calculates the resistance and capacitance of interconnects Based on placement of cells Routing segments Calculates capacitive effects of adjacent segments Extracts capacitance between metal segments RC data is transferred to Static timing analysis (back annotation) Gate level simulation Replaces wire load model used in synthesis Tools used: Cadence Hyperextract , Magma’s Blast Fusion Sequence Columbus, Synopsys Star-RC, Mentor X-Calibre CSE241 L3 ASICs.28 Kahng & Cichy, UCSD ©2003 Signal Integrity SI Crosstalk issues Inductance Interference Need new tools Calculate and estimate SI New delay models with SI estimates SI aware routing CSE241 L3 ASICs.29 Kahng & Cichy, UCSD ©2003 Formal Verification Compares golden netlist to current netlist Logic equivalence Comparison of pre- and post-layout netlist Similar to the formal verification step after synthesis; clock tree insertions, drive strength changes, etc. have been made Buffer insertion or logic optimization may have been performed CSE241 L3 ASICs.30 Kahng & Cichy, UCSD ©2003 Physical Verification DRC – Design Rule Check LVS – Layout Versus Schematic Manufacturing check for long nets Net can accumulate charge during plasma etch and damage gate oxide GDSII Verifies that layout and netlist are equivalent at the transistor level Antenna Polygon/Layer spacing rules Verifies the design rules (DRC) Final merge of layout, routing and placement data for mask production Example tools: Mentor Graphics Calibre (DRC, LVS) Cadence Dracula, Diva CSE241 L3 ASICs.31 Kahng & Cichy, UCSD ©2003 Release to Manufacturing Final edits to the layout are made DRC and LVS are run to verify the correctness of the modified database ‘Tapeout’ documentation is prepared prior to release of the GDSII to the foundry Pad location information is prepared, typically in a spreadsheet Manufacturing steps Metal fill and metal stress relief rules are checked Manufacturing information such as scribe lanes, seal rings, mask shop data, part numbers, logos and pin 1 identification information for assembly are also added Cadence’s Virtuoso is used for custom-manual edits of the mask layers generation of masks silicon processing wafer testing assembly and packaging manufacturing test CSE241 L3 ASICs.32 Kahng & Cichy, UCSD ©2003 Outline Design types and cost / complexity drivers Basic flow On convergence and hierarchy CSE241 L3 ASICs.33 Kahng & Cichy, UCSD ©2003 Evolution of Design Flow •Yesterday 1000nm • Today 130nm •System •Design •System •Design •Software •Design •Logic •Design • •System •Model •Hw/Sw •Optimization •System •Design •SW •Synthesis •Synthesis •+ Timing Analysis •+ Placement Opt •File •File •Functional •Verification •File •File •Timing Analysis •File •Performance •Verification •MASKS •Testability •Verification •Auto-Pilot •Optimize •Hw/Sw •SW •Logic •Circuit •Place •Wire •other •Comm. •Hw/Sw •Data •Model •Repository •Analyze •Perf. •Timing •Power •Noise •Test •Mfg. •other •MASKS •Equivalence checking •Place/Wire •Performance •Testability •Verification •EQ check •Timing Analysis •Functional •Performance •Testability •Verification •Cockpit •SW •Opt • RTL • •System •Model •SPEC •Perf. •Model •Functional •Verification •RTL Tomorrow 50nm •Place/Wire •+ Timing Analysis •+ Logic Opt •File •MASKS •Multiple design files are converged into one efficient Data Model •Disk accesses are eliminated in critical methodology loops •Verification of Function, Performance, Testability and other design •criteria all move to earlier, higher levels of abstraction followed by •equivalence checking and •assertion driven design optimizations •Industry Standard interfaces for data access and control •Incremental modular tools for optimization and analysis• Aristo, DAC-2000 ARISTO Library TYPICAL DESIGN FLOW Design Constraints IP Blocks Design Netlist Gate-Level Verilog RTL Verilog Hard Blocks Concurrent Block Synthesis Concurrent Block Partitioning, Clustering & Placement Early Planning Block Shaping, Compaction & Concurrent Port Placement Gate-Level Optimization Design Refinement Chip Assembly Gate-Level Place & Route Top-Level Routing RC Extraction PREDICTABLE HIERARCHICAL DESIGN CONVERGENCE CSE241 L3 ASICs.35 Timing Analysis Kahng & Cichy, UCSD ©2003 Monterey, DAC-2000 RTL Behavioral / RTL synthesis statistical WLM timing library Design Signoff Physical Prototyping Route logic Increasing Modeling Detail GDSII CSE241 L3 ASICs.36 Kahng & Cichy, UCSD ©2003 Design Closure Input RT-level HDL + technology + constraints Output “go”: recipe for invocation and composition of SP&R results “no go”: diagnosis of RTL code problems Logical and physical hierarchies co-evolve spatial: top-down coarse placement physical hierarchy logic/timing: implementable RTL logical hierarchy limits of human fanout, organizations always have hierarchy - Have seen a natural sequence of no-floorplanning, physicalfloorplanning, RTL-floorplanning... as chip complexities increase Details (must construct, predict, ignore, eliminate, ...) pin optimizations, interconnect planning, hierarchy reconciliations, budgeting mechanisms, compatibility with downstream SP&R, ... CSE241 L3 ASICs.37 Kahng & Cichy, UCSD ©2003 Logical and Physical Hierarchies Two hierarchies: logical/functional, and physical (schematic hierarchy also typical in structured-custom) RTL design = logical/functional hierarchy provides valuable clues for physical embedding: datapath structure, timing structure, etc. can be incredibly misleading (e.g., all clock buffers in a single hierarchy block) Main issues: how to leverage logical/functional hierarchy during embedding when to deviate from designer’s hierarchy methodology for hierarchy reconciliation (buffers, repartitioning / reclustering, etc.) CSE241 L3 ASICs.38 Kahng & Cichy, UCSD ©2003 Functional Partitioning •Subblocks in A connected with subblocks in B result in •600 top level nets. Source: ReShape CSE241 L3 ASICs.39 Kahng & Cichy, UCSD ©2003 Physical Partitioning Physical partitioning reduced the number of top level nets from 600 to 0 Source: ReShape CSE241 L3 ASICs.40 Kahng & Cichy, UCSD ©2003 Unconstrained Placement CSE241 L3 ASICs.41 Kahng & Cichy, UCSD ©2003 Floorplanned Placement CSE241 L3 ASICs.42 Kahng & Cichy, UCSD ©2003 “Thermal” Map of Routing Congestion CSE241 L3 ASICs.43 Kahng & Cichy, UCSD ©2003 “Natural” Block Shapes Are not disjoint rectangles, e.g., intersecting timing paths all want to be embedded as “straight paths” Blk A Blk B 1.0 0.5,0.5 1.0 Traditional chip floorplan = dissection into rectangles may not be optimum for wirelength and timing, but has compensating advantages (convenience) CSE241 L3 ASICs.44 Kahng & Cichy, UCSD ©2003 Physical Hierarchy Physical hierarchy = hierarchical, very structured organization of the core layout region Potentially, little relation to high-quality (e.g., w.r.t. timing, routability) embedding of logic Some obvious exceptions regular structures (memories, PLAs, datapaths) hard IP blocks And, physical hierarchy helps to define and plan global interconnects Recent trend: try to avoid artifactual physical hierarchy created by top-down recursive bipartitioning-based placement approach CSE241 L3 ASICs.45 Kahng & Cichy, UCSD ©2003 Convergence and Predictability We seek a predictable, estimatable back end (physical implementation after some handoff level of design) Predictability == regression models? (e.g., wireload models) Predictability == an enforceable assumption? (“correct by construction”) constant-delay paradigm (logical effort, DEC, IBM, Magma, ...) Predictability == fast constructive prediction? (also “correct by construction”) RT-level (Tera Systems), gate-level flat full-chip (Silicon Perspective Corp. FirstEncounter) Predictability == remove the need for predictability? GALS, LIS (global-asynchronous/local-synchronous; latencyindependent synchronization) “protocol- / communication-based system-level design” Or, just make the loops tighter and easier (“construct by correction”) CSE241 L3 ASICs.46 Kahng & Cichy, UCSD ©2003 Planning Technology RTL partitioning understand interaction b/w block definition and placement quality recognize and cure a physically challenged logic hierarchy Global interconnect planning and optimization symbolic route representations to support block plan ECOs Controllable SP&R back end (including power/clock/scan) Estimators (“initial wireload models”) Incremental / ECO optimizations, and optimizations that are “robust” under partial or imperfect design knowledge to account for resource, topological heterogeneity to account for optimizations (placement, ripup/reroute, timing) “earliest RTL signoff with detailed P&R knowledge” CSE241 L3 ASICs.47 Kahng & Cichy, UCSD ©2003 Extra Slides CSE241 L3 ASICs.48 Kahng & Cichy, UCSD ©2003 Sequence, DAC-2000 3DPrepare Extraction Database Timing Sign-off RTL Synthesis Place & Route Delay True-3D Calculation Parasitics Sequence Timing Timing Analysis Analysis Interconnect Interconnect Driven Driven Optimization Optimization Driver sizing, topology-based optimization CSE241 L3 ASICs.49 Kahng & Cichy, UCSD ©2003 Cadence, DAC-2000 RTL, chip constraints Partitioning & Log/Phys Mapping Constraints complete and block RTLs are feasible Block Area/Performance Estimation Block Placement Inter-block Routing and Buffering Ensure interblock delays are accounted for Communication Logic Synthesis Concurrent Placement, Synthesis And Route of Cells in Blocks No iterations from here down Finalize Route/Extract/Back Ann. CSE241 L3 ASICs.50 Kahng & Cichy, UCSD ©2003 Magma, DAC-2000 “fixed timing” 0.6ns 0.6ns 0.6ns 0.6ns FF Actively managing wire delay: Through automatic sizing (sizing-driven placement) Through buffer insertion CSE241 L3 ASICs.51 Kahng & Cichy, UCSD ©2003 Interconnect Complexities Interconnect effects play a major role in the increasing costs for large hard-block or rectilinear-outline based design styles Probabilistic wireload models fail Without new capabilities for soft IP design and assembly, interconnect problems will significantly impact performance and cost for emerging IC technologies Occurrence Rate (Normalized) Local wires blocks Global wires ~0.5 CSE241 L3 ASICs.52 global wires wirelength die _ size Courtesy Pileggi, MARCO GSRC Kahng & Cichy, UCSD ©2003 Technology Scaling Block sizes cannot grow as rapidly as chip sizes since block design becomes increasingly more difficult --- each block is a chip design over multiple configurations If the blocks are inflexible, the global wiring problems Occurrence Rate (Normalized) begin to dominate all aspects of performance quality and system cost ~0.5 CSE241 L3 ASICs.53 wirelength die _ size Courtesy Pileggi, MARCO GSRC Larger chip with finer feature sizes Kahng & Cichy, UCSD ©2003 Soft Blocks With soft, flexible blocks, the system assembly can more thoroughly exploit the available technology Interconnect problem is controlled via: soft boundaries for area re-shaping; re-synthesis and re-mapping for timing; smart wires; and top-down specified block synthesis Cf. “Amoeba” placement, coloring analysis of “good” Occurrence Rate (Normalized) placements with respect to original logic hierarchy, etc. ~0.5 CSE241 L3 ASICs.54 wirelength die _ size Courtesy Pileggi, MARCO GSRC Superior timing, power and cost Kahng & Cichy, UCSD ©2003 Taxonomy of Planning / Implementation Centered on logic design (“logic synthesis drives”) wire-planning methodology with block/cell global placement global routing directives passed forward to chip finishing constant-delay methodology may be used to guide sizing Synopsys, (Magma) Centered on physical design (“layout synthesis drives”) placement-driven or placement-knowledgeable logic synthesis Cadence, Avant! Buffer between logic and layout synthesis (“thin layer”) placement, timing, sizing optimization tools Sequence Centered on SOC, chip-level planning interface synthesis between blocks communications protocol, protocol implementation decisions guide logic and physical implementation CSE241 L3 ASICs.55 Kahng & Cichy, UCSD ©2003