Digital Engineering Laboratory Course Introduction & FPGA Concepts and Design ECE 554 Department of Electrical and Computer Engineering University of Wisconsin - Madison 9/6/2000 1 Instructors • Prof. Mike Schulte – Office: 4619 Engineering Hall – Office hours: 1:00 to 3:00 PM on Wednesdays • Arisandi Widjaja, TA for M, W Labs – Office: 4620 Engineering Hall – Office hours are assigned lab hours. • Eric Jackowski, TA for T, R Labs – Office: 4620 Engineering Hall – Office hours are assigned lab hours. 9/6/2000 2 Course Objectives 1) deal with problems and solutions associated with many aspects of a large digital design project, 2) work effectively as a member of a moderate-sized team, 3) use contemporary commercial design tools, and 4) use programmable user-defined devices (FPGAs) for rapid prototyping. 9/6/2000 3 Prerequisites and Location • ECE 351 – Digital Logic Laboratory • ECE/CS 552 – Introduction to Computer Architecture • ECE 551 - Digital System Design and Synthesis (strongly recommended) • Laboratory: 3628 Engineering Hall • Lecture: 3534 Engineering Hall 9/6/2000 4 Course Overview Grading • 15% Lab Exercise (miniproject) – first 3.5 weeks – Design a Special Purpose Asynchronous Receiver/Transmitter (team) • 20% Midterm Bench Exam – on 11/3 and 11/4 – Designed to test your understanding of Design Specifications, Verilog, Lab Environment, etc. (individual) • 65% Project – demos 12/13, 12/14 & reports 12/20 – Design, implement, test, and program a general or special purpose digital computer, usually 9/6/2000 emphasizing some particular features (team) 5 FPGA Concepts and Design • • • • CMOS IC design alternatives RAM cell-based FPGA uses The Xilinx Virtex Series FPGA technology The Xilinx Foundation 3.1i design process 9/6/2000 6 CMOS IC Design Alternatives ASIC FULL CUSTOM STANDARD CELL FIELD PROGRAMMABLE SEMICUSTOM GATE ARRAY, SEA OF GATES STANDARD IC FPGA CPLD • Field Programmable Gate Array (FPGA) – a hardware device with programmable logic, routing, memory, and I/O 9/6/2000 7 RAM Cell-Based FPGA Uses • Prototyping gate array, standard cell, or full custom integrated circuits (ICs) • Prototyping complete systems • Implementing “hardware simulation” • Replacing ICs • Providing multifunction reconfigurable system ICs 9/6/2000 8 Xilinx Virtex FPGA Architecture • Primary Reference: – On-Line Xilinx Data Sheet DS003 (v.2.5, April 2, 2001) - http://www.xilinx.com/partinfo/ds003.pdf • Figure 1: Virtex Architecture Overview – IOBs - Input/Output Blocks – CLBs - Configurable Logic Blocks • Function generators, Flip-Flops, Combinational Logic, and Fast Carry Logic – GRM - General Routing Matrix – 3-State Buffers – BRAMs - Block SelectRAM (configurable memory) – DLLs - Delay-Locked Loops for clock control 9/6/2000 – VersaRing - I/O interface routing resources 9 Figure 1- Virtex Architecture Overview 9/6/2000 10 Virtex FPGA Architecture • Logic configured by values stored in SRAM cells – CLBs implement logic in SRAM-stored truth tables – CLBs also use SRAM-controlled multiplexers – Routing uses “pass” transistors for making/breaking connections between wire segments – Block RAMs allow programmable memories with configurable sizes (1, 2, 4, 9/6/2000 11 8, or 16 bits) Table 1 – Virtex FPGA Family Members • We are using the XCV800 device • 0.22 micron, five-layer metal process 9/6/2000 12 IOB - Input/Output Block • See Figure 2: Virtex Input/Output Block – Separate signals for input (I), output (O), and output enable (T) – Three storage elements function as D flip-flops or latches with clock enable (CE) and set/reset (SR) – I/O pins can connect directly to internal logic or through the storage element – Programmable input delay – 3-state output buffer – I/O pad can use pull-up, pull-down, or weak keeper – Supports a wide range of voltages 9/6/2000 13 Figure 2: Virtex Input/Output Block 9/6/2000 14 CLB - Configurable Logic Block • See Figure 4: 2-Slice Virtex CLB • Each slice contains two logic cells (LCs) and consists of – 2 4-input look-up tables (LUTs) – 2 D flip-flops/latches – Fast carry and control logic – Three-state drivers – SRAM control logic 9/6/2000 15 Figure 4: 2-Slice Virtex CLB 9/6/2000 16 CLB - Configurable Logic Block See Figure 5: Detailed View of Virtex Slice • • Logic Function Implementation – 2 Function Generators - Each a 4-input LUT implements any 4-input function – F5 multiplexer - combines two LUTs with select input - implements any 5-input function, 4-to-1 mux, or selected functions of up to 9 inputs. – F6 multiplexer - combines outputs of two F5 multiplexer - implements any 6-input function, 8to-1 mux, or selected functions of up to 19 inputs. – Four direct feedthrough paths - useful to facilitate routing by use of through-the-cell paths 9/6/2000 17 Figure 5: Detailed View of Virtex Slice 9/6/2000 18 CLB - Configurable Logic Block • Storage Elements – – – – 9/6/2000 2 D flip-flops/latches Optionally included in cell output paths Shared clock enable Shared synchronous/asynchronous Set/Reset signals • SR - forces storage element into initialization state specified (0 or 1) • BY - forces storage element into opposite state 19 CLB - Configurable Logic Block • Fast Carry Logic (See Figure 5) – Two chains of two bits per CLB – AND gate, 0/1 Mux, CY Mux, EXOR • 3-state Drivers (BUFT) - on-chip drivers with independent control and input pins • Distributed LUT SelectRAMs – one per logic cell, 2 LUTs can be reconfigured as one of: • • • • • 9/6/2000 Two 16 x 1-bit synchronous RAM 16 x 2-bit synchronous RAM 32 x 1-bit synchronous RAM 16 x 1 dual-port synchronous RAM Two 16-bit shift registers 20 Block SelectRAM • Fully synchronous dual-ported 4096-bit RAM – Stores address, data and write-control signal on inputs at clock edge – Cannot change address, even for read without using clock – For dual port use, interesting timing restrictions – Independent control signals for each port • Organized in vertical columns of blocks on left and right of CLB array • Block height is 4 CLBs => Number of block RAMs per column is (height of CLB of array)/4 • See Tables 3 & 4 and Figure 6. 9/6/2000 21 Tables 3 & 4 and Figure 6 9/6/2000 22 Programmable Routing Matrix • Local Routing – See Figure 7: Virtex Local Routing – Interconnections among LUTs, flip-flops, and General Routing Matrix (GRM) – Internal CLB feedback paths that can chain LUTs together – Direct paths between horizontally-adjacent CLBs – Short connections with few “pass” transistors => low delay => high-speed connections – Mix of hardware and software is used to try to minimize routing delay 9/6/2000 23 Figure 7: Virtex Local Routing 9/6/2000 24 Programmable Routing Matrix • I/O Routing – VersaRing – Supports pin-swapping and pin-locking – Facilitates pin-out flexibility for concurrent connecting component design • Dedicated Routing (not programmable) – Four partitionable bus lines per CLB row driven by BUFTs (See Figure 8: BUFT Connections) – Two dedicated nets per CLB for vertical carry signals to adjacent cells 9/6/2000 26 Figure 8: BUFT Connections 9/6/2000 27 Clock Distribution • Via primary global routing resources • See Figure 9: Global Clock Distribution Network • Four global buffers – Two at top center – Two at bottom center • Four dedicated clock input pads • Input to global buffers from pads or from general purpose routing 9/6/2000 29 Figure 9: Global Clock Distribution Network 9/6/2000 30 Delay-Locked Loops (DLLs) • One associated with each clock buffer • Eliminate skew between clock input pad and internal clock-input pins within the device • Each can drive two global clock networks • Clock edges reach internal flip-flops 1 to 4 clock periods after they arrive at the input. • Provides control of multiple clock domains • Has minimum clock frequency restrictions! 9/6/2000 31 Table 1 and Figures 4 & 7 9/6/2000 32 Configuration • How is the FPGA configured? • Implemented by – Clearing configuration memory – Loading configuration data into 2-D configuration SRAM – Activating logic via a startup process • Configuration Modes – Slave-Serial – FPGA receives bit-serial data (e.g., from PROM) synchronized by an external clock – Master-Serial - FPGA receives bit-serial data (e.g., from PROM) synchronized by FPGA clock – SelectMAP - Byte-wide data is written into the FPGA with a BUSY flag from FPGA controlling the flow of data – Boundary-scan – Configuration is done through the Test Access Port • The XCV800 device requires 4,715,616 configuration bits 9/6/2000 34 • • • • • • • • 9/6/2000 Summary of XCV800 Characteristics Maximum Gate Count 888,439 CLB Matrix Logic Cells Maximum IOBs Flip-Flop Count 43,872 Block RAM Bits Horizontal TBUF Long Lines TBUFs per Long Line 168 56 x 84 21,168 512 114,688 224 35 Discussion • What advantages and disadvantages do FPGAs have compared to standard-cell based ASICs? • In what types of systems are FPGAs commonly used? 9/6/2000 36 THE ECE 554 XILINX DESIGN PROCESS • • • • • 9/6/2000 Design process overview Design references Xilinx libraries Design tutorial What’s next 37 Design Process Steps • Definition of system requirements. – Example: ISA (instruction set architecture) for CPU. – Includes software and hardware interfaces with timing. – May also include cost, speed, reliability and maintainability specifications. • Definition of system architecture. – Example: high-level HDL (hardware description language) representation - this is not required in ECE 554, but is done in the real world). – Useful for system validation and verification and as a basis for lower level design execution and 9/6/2000 validation or verification. 38 • Design Process Steps(continued) Refinement of system architecture – In manual design, descent in hierarchy, designing increasingly lower-level components – In synthesized design, transformation of high-level HDL to “synthesizable” register transfer level (RTL) HDL • Logic design or synthesis – In manual or synthesized design, development of logic design in terms of library components – Result is logic level schematic or netlist representation or combinations of both. – Both manual design or synthesis typically involve optimization of cost, area, or delay. 9/6/2000 39 Design Process Steps (Continued) • Implementation – Conversion of the logic design to physical implementation – Involves the processes of: • Mapping of logic to physical elements, • Placing of resulting physical elements, • And routing of interconnections between the elements. – In case of SRAM-based FPGAs, represented by the programming bitstream which generates the physical implementation in the form of CLBs, IOBs, BRAMs, and the interconnections between them 9/6/2000 40 • Design Process Steps (Continued) Validation (used at number of steps in the process) – At architecture level - functional simulation of HDL – At RTL level- functional simulation of RTL HDL – At logic design or synthesis - functional simulation of gate-level circuit - not usually done in ECE 554 – At implementation - timing simulation of schematic, netlist or HDL with implemention based timing information (functional simulation can also be useful here) – At programmed FPGA level - in-circuit test of function and timing 9/6/2000 41 Xilinx HDL/Core Design Flow DESIGN ENTRY RTL HDL EDITING CORE GENERATION RTL HDL-CORE SIMULATION SYNTHESIS IMPLEMENTATION 9/6/2000 TIMING SIMULATION FPGA PROGRAMMING & IN-CIRCUIT TEST 42 Xilinx HDL/Core Design Flow - HDL Accessed Editing within DESIGN WIZARD LANGUAGE ASSISTANT HDL Editor HDL Module Frameworks Language Construct Templates HDL EDITOR RTL HDL Files 9/6/2000 43 Xilinx HDL/core Design Flow - Core Generation Select core and specify input parameters CORE GENERATOR EDIF netlist for core_name 9/6/2000 HDL instantiation module for core_name Other core_name files 44 Xilinx HDL/core Design Flow - HDL Functional HDL instantiation Simulation Set Up and Map RTL HDL Files module for work Library core_names Testbench HDL Files Compile HDL Files Test Inputs or Force Files EDIF netlists for core_names MODELSIM Functional Simulate 9/6/2000 Waveforms or List Files 45 Xilinx HDL Design Flow - Synthesis All HDL Files Edit FPGA Express Synthesis Constraints Synthesis/Implementation Constraints Select Top Level EDIF netlists for core_names Select Target Device FPGA EXPRESS Synthesize Gate/Primitive Netlist Files (EDIF or XNF) 9/6/2000 Synthesis Report Files 46 Xilinx HDL/core Design Flow Implementation Gate/Primitive Netlist Files (XNF or EDN) Netlist Translation Map Model Extraction XILINX DESIGN MANAGER Place & Route Timing Model Gen HDL or EDIF for Implemented Design 9/6/2000 Create Bitstream BIT File Standard Delay Format File 47 Xilinx HDL/core Design Flow - Timing Simulation HDL or EDIF for Implemented Design Standard Delay Format File Set Up and Map work Directory Testbench HDL Files Test Inputs, Force Files Compile HDL Files Compiled HDL MODELSIM HDL Simulate 9/6/2000 Waveforms or List Files 48 Xilinx HDL Design Flow - Programming and Incircuit Verification Bit File Input Byte GXSLOAD GXSPORT ECE 554 FPGA Board Other Inputs Outputs 9/6/2000 49 Design References -1 • There are two Xilinx 4.2i releases – 4.2i : uses Synopsys FPGA Express synthesis tool (we use this one) – ISE 4.2i: uses Xilinx XST synthesis tool • The manuals are a bit mixed – Do not use material related to XST • Manuals (are provided on website and in tools) 9/6/2000 – http://homepages.cae.wisc.edu/~ece554/website/ToolDoc/T oolDoc.html – FPGA complier II/FPGA express Verilog HDL reference manual - essential guide to writing Verilog for FPGA express - suggest you download and print a copy for your use 2 pages/page – Synthesis and simulation design guide - lots of useful 50 information on writing HDL code – CORE generator guide - you will use cores lots, so can be Design References - 2 – Libraries guide - useful only if you want to instantiate parts – Constraints guide – in particular, useful if you want to use timing constraints – Foundation series 4.2i installation guide and release notes good for finding bugs, but always out-of-date - use on-line answers database instead • The following guides are occasionally useful, but far less frequently: – – – – – Design manager/flow engine guide Development system reference guide Foundation series 4 user guide FPGA compiler II/FPGA express VHDL reference manual Global Glossary • Databook, app. notes, and answers database on-line at: 9/6/2000 http://support.xilinx.com/support/support.htm 51 Simulation References • Most useful: – ModelSim SE user’s manual • Occasionally referenced: – ModelSim SE command reference 9/6/2000 52 The Xilinx Libraries • Useful only if you have to instantiate (in your HDL) Xilinx primitives or macros (not all can be instantiated) from the Libraries guide. • Note selection guide includes CLB counts and section at front on notation used to describe macros. 9/6/2000 53 Design Practices • Use synchronous design. – CLBs are actually reading functions from SRAM! – Avoid clock gating. – Avoid ripple counters. – Avoid use of direct sets and resets except for initialization. – Synchronize asynchronous signals as needed. – Study timing issues handout. 9/6/2000 54 What’s Next • Verilog HDL – introductory lecture will give an overview of Verilog, our HDL language of choice • HDL/core design flow – design tutorial next week will employ the flow described for a Verilog HDL/core example 9/6/2000 55