Digital VLSI Design Design of Very Large Scale Integrated Digital circuits using CAD tools http://uhaweb.hartford.edu/ilumokanw Syllabus University of Hartford – College of Engineering Electrical Engineering Department ECE565 Digital VLSI Design Fall 2005 Professor: Dr. Abby Ilumoka, Room UT 235, Ph: (860) - 768 – 5231 Email: ilumokanw@hartford.edu Website: http://uhaweb.hartford.edu/ilumokanw Class Time: Tue Thu, 4.15-5.30pm Office Hrs: : Wed 2-3.30pm, Tues, Thur 1.30pm – 2.30pm (other consultation by appointment) Credit Hours 3 Lecture Hours 1.75hr/w Laboratory Hours 0.75hr/wk Prerequisites/Co-requisites Digital System Logic(EE231), Digital Laboratory(EE232), Electronics Circuits (EE362), Electronics Lab II (EE364), Senior or graduate standing Textbook Digital Integrated Circuit Design by Martin, Oxford Publishing References CMOS Digital and Analog Circuit Design by John Uyemura, Oxford Publishing Software Tanner VLSI Design Suite: LEDIT Pro Full Custom Layout Editor, TSPICE Pro Circuit Simulator, UPLib, CMOS Lib, SEDIT Schematic Editor, LVS Netlist Comparator Syllabus (contd) Bulletin Description Techniques for CMOS digital integrated circuit design at circuit, subsystem and system levels. CAD tools for design from schematic capture to physical layout. Design methodologies – programmable logic, standard cell, full custom; CMOS fabrication technology; design issues – speed, power, reliability, testability; CMOS design case studies. Laboratory project. Course Outcomes When the students have completed this course, they will be able to design state-of-the-art digital integrated circuits. They will have acquired in depth knowledge of VLSI design constraints as well as degrees of design freedom available to them thus enabling standard cell and full custom design of digital integrated circuits using both mask and netlist level tools. Assessment 3 X 75min Exams. Each exam counts 25% toward final grade. Cell Library Design counts 25% Other Course Information Exam Dates: Exam 1 9/27, Exam 2 Oct 27, Exam 3 Nov 17, 4.15 – 5.30pm (Final), Mini-Projects due Fri Dec 16 TOPICS • • • • • • • • • Introduction and MOSFET Electrical Properties Design Methodology (Fabrication) Digital System Building Blocks Design of Microprocessor datapath VLSI Circuit Concepts (R,C Delays and Crosstalk) Partitioning, Floorplanning and Placement Grid Global and Channel Routing VLSI Circuit Optimization and Testing Supplementary Topics Historical: 2003 Technology Intel Itanium Line 64-bit dual-processor chips • Itanium Deerfield - low-power 1GHz Itanium 2 processor • Consumed about half as much power (62 watts) as predecessor • For lower-cost systems, power conservation important ($744) • Itanium's Madison 1.4 GHz processor, 1.5M bytes of level 3 cache, cost $1,172 • For systems running at least two processors • Supercomputing-like performance for the scientific and technical markets. Historical: 2004 Technology • World’s highest performance 2004 desktop processor - Intel Pentium 4 • Operated @ 2.8-3.4GHz • Built with 0.13um technology, 533MHz system bus • Hyper-pipelined technology - longer pipeline boosts speed • Intel released retooled version of Pentium 4 code-named Prescott - came with 31-stage pipeline, functions like internal assembly line (Older Pentium 4s had only 20-stage pipeline Pentium III had ten-stage pipeline) • Intel developed Pentium M - energy-efficient chip for notebooks, shared characteristics of both Pentium III & 4 • Pentium 4’s feature enhanced floating point and Multimedia Performance for Digital Lifestyle – Reduced time required to encode digital media e.g. music, pictures, movies. Processor Cost = $508 in 2004, slashed by average $200 in 2005 2004 Intel Itanium Low Power Power Headaches • Problem of heat dissipation in modern semiconductors causing manufacturers like Intel to kill faster clock speeds • Over past decades engineers have scaled microprocessor to smaller dimensions in accordance with Moore’s Law, so that today some elements are only a few layers of atoms thick. Thinness of structures contributes to power headaches - current leakage, power consumption and high operating temperatures. • High power consumption generates unwanted heat and decreases battery life of portable devices like notebooks and handhelds. The well-known leakage problem gets worse with successive process generations • Big dilemma for entire semiconductor industry. • Latest Intel® Pentium® 4 processors with over 125 million transistors built on 90nm process technology consume as much as100 watts (glowing 100W light bulb – ouch!) • Today’s PCs - Large cooling elements, noisy fans, and massive heat sinks • Solution?? Eureka! Enter Multicore Technology • Dual-core and multicore chips change the game • By placing more than one computational engine or core on each die, Intel can continue to add more and more transistors to its processors and diminish troublesome effects of processor scaling. • Intel plans to run dual-core chips at lower frequencies than single core chips so they’ll require lower voltage and throw off less heat • Two cores on a single chip will enable a processor to do more without a proportional increase in power • Dual-core chips not the same as dual-processor systems. Many servers today have two or more processors on same motherboard These dual-processor or multi-processor systems widely used in enterprise computing environments • By contrast, dual core components have two complete processor chips inside each package - big manufacturing change from today's single core chips • Promises temporary relief from power and thermal challenges threatening processor performance Era of Parallelism: 2005 Double Vision?? Smithfield • Pentium who? Pentium Extreme Edition 840 Intel dual-core chip thoroughly Pentium 4 heritage • Code-named "Smithfield," pair of Pentium 4 "Prescott" cores situated together on single piece of silicon. Each core has 1MB of L2 cache onboard, and two cores share an 800MHz front-side bus. Siamese twin action • Smithfield manufactured using same basic 90nm fabrication process as current Pentium 4 chips. However, roughly twice size of Prescott core at 230 million transistors and 206 mm2 of die space • IBM produced first multicore Power4, in 2001 (Intel aims to be first in volume production of the new chips across all market segments: server, desktop, and mobile) 2005/06 Technology • Parallelism revolution continues • Intel Development Forum (IDF) CA, Aug 2005 • Intel CEO introduces new 65nm dual core microprocessor designed to bring increased power per watt , production begins end 2005, in market by 2nd half 2006 • 2006 shipments (60million) based on 65nm to surpass current 90nm • Processors allow chipmakers to get more performance out of a single piece of silicon without boosting power consumption and heat generation. • Enables computer programs to work on more than 1 task at same time • For example, multi-core technology helps Google process data in parallel, while controlling power and electricity costs • New processor - applied to laptops - code-named Merom • Applied to Desktop computers - code-named Conroe • Applied on Server platforms – code-named Woodcrest Software Adjustments: Hyper-Threading • Many software vendors have already programmed their code to utilize the multithreaded capabilities of HyperThreading technology • Hyper-Threading Technology enables software applications to execute threads in parallel. To improve performance, threading enabled in software by splitting instructions into multiple streams so that multiple processors can act upon them. • Delivers faster response times for multi-tasking • Multicore processors benefit from the same programming optimizations as for HyperThreading • Dualcore will provide an immediate performance improvement to hyperthreading applications • Operating systems such as WindowsXP and Linux have been optimized for multicore processors and are ready to support Intel's next generation processors as soon as they are launched…. • Multicore has also raised question of software licensing and customer billing ($$). Some vendors have considered charging license fees on a per processor basis, charging more for dual or multi core systems. Microsoft has announced that its software will be licensed on a per processor package basis - only one license necessary regardless of how many cores are contained within processor. Intel Family Overview • • • • • • • • • • • >100million devices/chip (gigascale integration) 80286 - 100,000 devices 80386 - 275, 000 devices 80486 - 1,000,000 devices Pentium III – 3,000,000 devices Pentium 4 – over 5,000,000 devices/chip (VLSI, ULSI, Gigascale) MultiCore – Smithfield, Merom, Conroe, Woodcrest How is a design of this complexity realized? Must automate design, powerful CAD tools CAD Tools research and development Decompose design process into different levels of abstraction Levels of Abstraction in VLSI Design Idea for New VLSI Chip CAD/Subproblem Level Generic CAD Tools Architectural Design Behavioral/Architectural Level Behavioral Level & Simulation Tools Logical Design Register Transfer/Logic Level Logic Minimization & Simulation Tools Physical Design Cell/Mask Level Layout Editing, Partitioning Placement & Routing Tools Levels of Abstraction: Architectural Design Idea for New VLSI Chip CAD/Subproblem Level Generic CAD Tools Architectural Design Behavioral/Architectural Level Behavioral Level & Simulation Tools Logical Design Register Transfer/Logic Level Logic Minimization & Simulation Tools Physical Design Cell/Mask Level Layout Editing, Partitioning Placement & Routing Tools Architectural Design • Carried out by human experts • Decisions affect Cost & performance e.g.Architectural Design of Microprocessor 1. What should instruction set be? 2. Should instruction pipelining be employed? 3. Should processor have on-chip cache? How big? 4. Should arithmetic unit be bit-serial or parallel? • CAD Programs aid system architect • Once architecture defined, 2 tasks Two Tasks at logic level • • • • Task 1 DATA PATH DESIGN What is the datapath? Functional Blocks, storage elements, hardware components which allow transfer of data • E.g. Adders, Multipliers, Shift registers, RAMs • Data transferred using tristate busses or mux /demux • • • • Task 2 CONTROL PATH DESIGN What is the control path? Modules which generate control signals necessary to operate circuit • E.g. initializing storage elements, initiate data transfer • hardwired or microprogrammed Design of 8-bit Adder A ← A+B • Sum in 8bit A Reg • 8bit B Reg unchanged • Economical Design • Some Possibilities: 1. 8bit CLA Adder 2. 8bit ripple carry adder 3. Two 4bit CLA adders with ripple carry between 4. 1bit adder, perform addition serially (8 clock cycles) Consider Option 4: Serial Adder Data & Control Paths • Serial approach gives minimum cost, uses 2 shift registers • Ak, Bk are kth significant bits of reg A & B • Full Adder adds Ak, Bk and Carry Ck-1 during kth clock • Carry generated in kth cycle saved in D flip flop (init set to 0) • Data Path: Two 8bit SR, 1FA, 1DFF, 2 Mux, 3bit counter • Multiplexer A selects between DtaIn and Sum output Control Path Design • • • • • • • • Control Signals needed SA - Shift A R by 1 bit SB - Shift B R by 1 bit MA - Control Mux A MB - Control Mux B RD - Reset D Fflop RC - Reset Counter STRT - Start Addition Control Algorithm • forever do while (STRT = 0) skip Reset DFF & Counter Set MA & MB to 0 repeat Shift A & B Right by one counter = counter+1 until counter = 8 Tradeoffs at Architectural Level • Serial adder cheap but slow and difficult to test • trade-off between cost, performance, testability, power etc. • 8bit parallel CLA adder fastest & most costly • view alternative options as points in design space • Specs may impose more constraints • Automated generation of data and control signals: high level synthesis may be necessary Levels of Abstraction: Logical Design Idea for New VLSI Chip CAD/Subproblem Level Generic CAD Tools Architectural Design Behavioral/Architectural Level Behavioral Level & Simulation Tools Logical Design Register Transfer/Logic Level Logic Minimization & Simulation Tools Physical Design Cell/Mask Level Layout Editing, Partitioning Placement & Routing Tools Design at Logic level • Data & Control paths contain logic blocks such as shift regs, muxs, buffers, ALU • Q: How is cct to be implemented? As PCB, VLSI or MCM ? • If PCB, are components available off the shelf? • If VLSI, what strategy? Full custom,standard cell or gate array? • In either case, components placed on layout surface and wired together Levels of Abstraction: Physical Design Idea for New VLSI Chip CAD/Subproblem Level Generic CAD Tools Architectural Design Behavioral/Architectural Level Behavioral Level & Simulation Tools Logical Design Register Transfer/Logic Level Logic Minimization & Simulation Tools Physical Design Cell/Mask Level Layout Editing, Partitioning Placement & Routing Tools Physical Design • Refers to all synthesis steps which succeed logic design but precede fabrication e.g. partitioning, placement, routing • Physical layout crucial in determining circuit performance, area, catastrophic yield, reliability • 1. Circuit Performance: Timing delays, Crosstalk metal, poly interconnect have finite impedance. Long lines have large inpedance, longer delays, crosstalk. Contacts, Vias slow signals down • 2. Area: functional and wiring affects yield (# of defect free chips) large chip area = low catastrophic yield Physical Design: Layout Effects • low yield = high prod cost = high cct unit cost • large area = modules widely spaced = long wires=delays and crosstalk • layout affects reliability: e.g. vias unreliable, layout with large #’s of vias prone to defects; line widths of metal tracks must be wide enough to prevent metal migration • course focuses on Physical, Custom Design Physical Design Strategies 3 main approaches differing in 2 ways 1. Layout Surface 2.structural constraints imposed on layout elements Full Custom Layout Editing to generate physical description of Circuit Field Prog Gate Array Realize cct by placing metal connections between transistors prefab on wafer in 3D array Standard cell Design Realization using predefined logic blocks or cells stored in library Full Custom Layout • Full control to the artwork designer in placing and interconnecting circuit blocks • expert can achieve high degree of optimization in area and circuit performance • difficult and expensive - many person months to layout ULSI chip - only used in mass prod cases • requires powerful CAD tools - layout editor with DRC, Compaction, Extraction • not for low prod volume ASICs • standard layout architectures to cut design time Layout Styles: Gate array • Mask programmable gate array or field prog • 2/3D array of unconnected transistors • Connections placed by either masking (MPGA) or applied voltage (FPGA) • 2 types of personalization: intra-cell or inter-cell • cell library maintained, intercell wiring by layout software • after personalization, wafer diced, chips packaged • foundries stock large #’s of pre-fab wafers • quick to fabricate • few processing steps, high catastrophic yield, cheap Layout Style:Standard Cell Layout • Standard cell - logic block performing specific function e.g. nand, xor, nor, d flipflop • cell library - data on std cells (function, pin structure, layout in givien technology)cells have same height • develop floorplan for layout • select library cells, place in Si, interconnect • placement & route simplified by dividing layout into rows sep by horiz routing channels • very flexible cf gate array, wiring space not pre-assigned, cell size can vary • Fab more complex than gate array Example of Std Cell • Inverter function • rectangular shape • dimensions 0.6u X4.8u, CMOS 0.18u technology • lower left corner at (-1, -1) • top right corner at (0.6, 4.8) • input a available at left • output available at right • VDD & GND lines available Macrocells, PLA & FPGA • Macrocells - No restrictions on cell size to allow more compact layout increased cell complexity (regs, ALU’s memory) efficient layout design of complex macrocells • PLA’s - Sum-of-Products minimal expression can be realized using 2-level logic: AND terms formed in 1st level, OR terms in 2nd level e.g. Z = A0.A1 +A0.A2 + A1.A2 easy to automate • FPGA’s (e.g. Xilinx, Altera) 2D array of configurable logic blocks, can implement any logic fn. Channels between blocks for interconnect. I/O blocks on periphery, interconnect and logic blocks field prog by user. Cheap prototyping, re-usable,slower. 100% use of gates not possible Complexity of Physical Design Problem • Problem can be viewed as complex optimization problem with multiple objectives and conflicting constraints • Good layout - min area, short wires, few vias, meet all specs/constraints e.g. target tech, routing space • difficult to fully automate • How can we simplify task? • Adopt stepwise approach:subdivide problem into manageable subproblems, each one a constrained optimization problem Problem Subdivision & Solution • Subproblems 1. Circuit Partitioning 2. Floorplanning and Channel definition 3. Circuit Placement 4. Routing (global) 5. Channel Routing • Find feasible solution to each constrained opt problem • Optimize objective • Stay within constraints • Subproblems NP Hard • Heuristic techniques