Digital Engineering Laboratory Course Introduction

advertisement
Digital Engineering Laboratory
Course Introduction & FPGA
Concepts and Design
ECE 554
Department of Electrical and
Computer Engineering
University of Wisconsin - Madison
3/15/2016
1
Instructors and Course Website
• Nam Sung Kim, nskim3@wisc.edu
– Office: 4615 Engineering Hall
– Office hours: Tue,Wed,Thur - 2:00 to 3:00 PM
Additional hours by appointment
• Chunhua Yao, yao1@wisc.edu
– Teaching Assistant for Labs
– Office hours are assigned lab hours – 3:30 to 6:30
Tuesday and Thursday
• The course website and wiki are at:
http://homepages.cae.wisc.edu/~ece554/new_website/
https://cgi.cae.wisc.edu/~ece554/pmwiki/pmwiki.php
2016/3/15
2
Course Objectives
• Deal with problems and solutions associated
with many aspects of a large digital design
project
• Work effectively as a member of a moderatesized team
• Use contemporary commercial design tools
• Use programmable user-defined devices
(FPGAs) for rapid prototyping
• Learn to live on Pizza and get by on very little
sleep  at least during the last part of the
course.
3/15/2016
3
Prerequisites and Location
• ECE 351 – Digital Logic Laboratory
• ECE/CS 552 – Introduction to Computer
Architecture
• ECE 551 - Digital System Design and
Synthesis (strongly recommended)
• Laboratory: 3628 Engineering Hall
• Lecture: 3444 EH
• Lectures and Reviews during Lab Hours:
3444 EH
3/15/2016
4
Access to the lab
• Laboratory: 3628 Engineering Hall
The lab access is password protected and
you will have access to the lab 24/7
• Password
3/15/2016
5
Course Overview Grading
• 15% Miniproject – due 2/5
– Design a Special Purpose Asynchronous
Receiver/Transmitter (team of 2)
• 20% Bench Exam – on 2/26
– Designed to test your understanding of Design
Specifications, Verilog, Debugging, Lab Environment,
etc. (individual)
• 65% Project – demos 5/5, report 5/14
– Design, implement, test, and program a general or
special purpose digital computer that emphasizes
some particular features (team of 4 to 6)
2016/3/15
6
Miniproject
• For the miniproject, you will
– Design a Special Purpose Asynchronous
Receiver/Transmitter (SPART) and its testbench in
Verilog/VHDL and use EDK toolset
– Simulate the design to ensure correct performance
– Download the design and associated files and
demonstrate correct functionality
– Preparing a report on your design
– https://cgi.cae.wisc.edu/~ece554/pmwiki/pmwiki.php?
n=Main.MiniProject
3/15/2016
7
Midterm Bench Exam
• You will be given a set of specifications for a small
system along with Verilog code for some pre-designed
modules for the system.
• You will be expected to:
– Understand the specifications
– Understand the Verilog code provided
– Write one or more Verilog modules
– Debug one or more Verilog modules
– Simulate one or more modules and the entire system
– Synthesize and implement the design
– Download, test, and demonstrate the design on the
FPGA board
3/15/2016
8
Project
• Design, simulate, synthesize, test, download and
demonstrate a non-trivial computer with an original
instruction set architecture (ISA)
• Four key requirements
– It must be an original ISA (somewhat negotiable)
– It must be non-trivial
– It must be tractable - everything takes at least twice as
long as you expect
– It must interface through the serial port with the
terminal emulator on the lab workstations (negotiable)
• Often has significant software component and utilizes
FPGA board interfaces
3/15/2016
9
Project Milestone
• Several major milestones
–
–
–
–
–
–
–
–
–
Project team selection – each team of 5 or 6 (2/3)
Project proposal presentation (2/12)
Architecture review presentation (2/19)
ISA report due (2/24)
Microarchitecture review presentation (3/24)
Testing and demo review presentation (4/7)
Several progress reviews (see syllabus)
Project demonstrations (5/5)
Project report due (5/14)
• For details see:
https://cgi.cae.wisc.edu/~ece554/pmwiki/pmwiki.php?n=Main.Milestones
2016/3/15
10
Major Lab Enhancement
• We have done a major enhancement to the
ECE554 lab recently, bear with us for version
updates
– All new computers and monitors
– All new FPGA boards and updated digital design
software
– Overall objectives of the lab will stay the same
– Some additional changes may happen this semester
– We will try to make the transition as smooth as
possible – thanks to Mitch
• Go over the syllabus
3/15/2016
11
FPGA Concepts and Design
•
•
•
•
CMOS IC design alternatives
RAM cell-based FPGA uses
The Xilinx Virtex Series FPGA technology
The Xilinx Integrated Software Environment (ISE)
design process
3/15/2016
12
CMOS IC Design Alternatives
ASIC
FULL
CUSTOM
STANDARD
CELL
FIELD
PROGRAMMABLE
SEMICUSTOM
GATE ARRAY,
SEA OF GATES
STANDARD
IC
FPGA
CPLD
• Field Programmable Gate Array (FPGA) – a hardware
device with programmable logic, routing, memory, and I/O
3/15/2016
13
RAM Cell-Based FPGA Uses
• Prototyping gate array, standard cell,
or full custom integrated circuits (ICs)
• Prototyping complete systems
• Implementing “hardware simulation”
• Replacing ICs
• Providing multifunction reconfigurable
system ICs
• Hardware accelerators
3/15/2016
14
Xilinx Virtex FPGA Architecture
• Primary Reference:
– On-Line Xilinx Data Sheet DS003 (v.2.5, April 2,
2001) - http://www.xilinx.com/partinfo/ds003.pdf
• Figure 1: Virtex Architecture Overview
– IOBs - Input/Output Blocks
– CLBs - Configurable Logic Blocks
• Function generators, Flip-Flops, Combinational Logic, and
Fast Carry Logic
–
–
–
–
3/15/2016
GRM - General Routing Matrix
BRAMs - Block SelectRAM (configurable memory)
DLLs - Delay-Locked Loops for clock control
VersaRing - I/O interface routing resources
15
Figure 1- Virtex Architecture Overview
3/15/2016
16
RAM-based FPGA
Xilinx XC4000ex
3/15/2016
17
Virtex FPGA Architecture
• Logic configured by values stored in
SRAM cells
– CLBs implement logic in SRAM-stored truth
tables
– CLBs also use SRAM-controlled multiplexers
– Routing uses “pass” transistors for
making/breaking connections between wire
segments
– Block RAMs allow programmable memories
with configurable widths (1, 2, 4, 8, or 16 bits)
3/15/2016
18
Memory
Look-up Table Based Logic Cell
Out
In
Out
00
00
01
1
10
1
11
0
ln1 ln2
3/15/2016
19
Programmable Routing
3/15/2016
20
Table 1 – Virtex FPGA Family Members
• We use the XCV800 device
• 0.22 micron, five-layer metal process
3/15/2016
21
IOB - Input/Output Block
• See Figure 2: Virtex Input/Output Block
– Separate signals for input (I), output (O), and output
enable (T)
– Three storage elements function as D flip-flops or
latches with clock enable (CE) and set/reset (SR)
– I/O pins can connect directly to internal logic or
through the storage element
– Programmable input delay
– 3-state output buffer
– I/O pad can use pull-up, pull-down, or weak keeper
– Supports a wide range of voltages
3/15/2016
22
Figure 2: Virtex Input/Output Block
3/15/2016
23
CLB - Configurable Logic Block
• See Figure 4: 2-Slice Virtex CLB
• Each slice contains two logic cells (LCs)
and consists of
– 2 4-input look-up tables (LUTs)
– 2 D flip-flops/latches
– Fast carry and control logic
– Three-state drivers
– SRAM control logic
3/15/2016
24
Figure 4: 2-Slice Virtex CLB
3/15/2016
25
CLB - Configurable Logic Block
• See Figure 5: Detailed View of Virtex Slice
• Logic Function Implementation
– 2 Function Generators - Each a 4-input LUT implements any 4-input function
– F5 multiplexer - combines two LUTs with select
input - implements any 5-input function, 4-to-1
mux, or selected functions of up to 9 inputs.
– F6 multiplexer - combines outputs of two F5
multiplexer - implements any 6-input function, 8to-1 mux, or selected functions of up to 19 inputs.
– Four direct feedthrough paths - useful to facilitate
routing by use of through-the-cell paths
3/15/2016
26
Figure 5: Detailed View of Virtex Slice
3/15/2016
27
CLB - Configurable Logic Block
• Storage Elements
–
–
–
–
3/15/2016
2 D flip-flops/latches
Optionally included in cell output paths
Shared clock enable
Shared synchronous/asynchronous Set/Reset
signals
• SR - forces storage element into initialization
state specified (0 or 1)
• BY - forces storage element into opposite state
28
CLB - Configurable Logic Block
• Fast Carry Logic (See Figures 4 and 5)
– Two chains of two bits per CLB
– AND gate (for mult), 0/1 Mux, CY Mux, EXOR
• 3-state Drivers (BUFT) - on-chip drivers with
independent control and input pins
• Distributed LUT SelectRAMs – one per logic cell,
2 LUTs can be reconfigured as one of:
•
•
•
•
•
3/15/2016
Two 16 x 1-bit synchronous RAM
16 x 2-bit synchronous RAM
32 x 1-bit synchronous RAM
16 x 1-bit dual-port synchronous RAM
Two 16-bit shift registers
29
Block SelectRAM
• Fully synchronous dual-ported 4096-bit RAM
– Stores address, data and write-control signal on
inputs at clock edge
– Cannot change address, even for read, without
using clock
– Independent control signals for each port
• Organized in vertical columns of blocks on left
and right of CLB array
• Block height is 4 CLBs => Number of block
RAMs per column is (height of CLB of array)/4
• See Tables 3 & 4 and Figure 6.
3/15/2016
30
Tables 3 & 4 and Figure 6
3/15/2016
31
Programmable Routing Matrix
• Local Routing
– See Figure 7: Virtex Local Routing
– Interconnections among LUTs, flip-flops,
and General Routing Matrix (GRM)
– Internal CLB feedback paths that can chain
LUTs together
– Direct paths between horizontally-adjacent
CLBs
– Short connections with few “pass”
transistors => low delay => high-speed
connections
– Combination of hardware and software is
used to try to minimize routing delay
3/15/2016
32
Figure 7: Virtex Local Routing
3/15/2016
33
Programmable Routing Matrix
• I/O Routing
– VersaRing
– Supports pin-swapping and pin-locking
– Facilitates pin-out flexibility
• Dedicated Routing (not programmable)
– Four partitionable bus lines per CLB row driven by
BUFTs (See Figure 8: BUFT Connections)
– Two dedicated nets per CLB for vertical carry
signals to adjacent cells
3/15/2016
35
Figure 8: BUFT Connections
3/15/2016
36
Clock Distribution
• Via primary global routing resources
• See Figure 9: Global Clock Distribution
Network
• Four global buffers
– Two at top center
– Two at bottom center
• Four dedicated clock input pads
• Input to global buffers from pads or from
general purpose routing
3/15/2016
38
Figure 9: Global Clock Distribution Network
3/15/2016
39
Delay-Locked Loops (DLLs)
• One associated with each clock buffer
• Eliminate skew between clock input pad and
internal clock-input pins within the device
• Each can drive two global clock networks
• Clock edges reach internal flip-flops 1 to 4
clock periods after they arrive at the input.
• Provides control of multiple clock domains
• Has minimum clock frequency restrictions!
3/15/2016
40
Table 1 and Figures 4 & 7
3/15/2016
41
Configuration
• How is the FPGA configured?
• Implemented by
– Clearing configuration memory
– Loading configuration data into 2-D configuration SRAM
– Activating logic via a startup process
• Configuration Modes
– Slave-Serial – FPGA receives bit-serial data (e.g., from
PROM) synchronized by an external clock
– Master-Serial - FPGA receives bit-serial data (e.g., from
PROM) synchronized by FPGA clock
– SelectMAP - Byte-wide data is written into the FPGA with a
BUSY flag from FPGA controlling the flow of data
– Boundary-scan – Configuration is done through the Test
Access Port
• The XCV800 device requires 4,715,616 configuration
bits
3/15/2016
43
XCV800 Characteristics
•
•
•
•
•
•
•
•
•
3/15/2016
Maximum Gate Count
CLB Matrix
Logic Cells
Maximum IOBs
Flip-Flop Count
Block RAM Bits
Horizontal TBUF Long Lines
TBUFs per Long Line
Program Data (bits)
888,439
56 x 84
21,168
512
43,872
114,688
224
168
4,715,616
44
THE ECE 554 XILINX DESIGN
PROCESS
•
•
•
•
3/15/2016
Design process overview
Design reference
Design tutorial
What’s next
45
Design Process Steps
• Definition of system requirements.
– Example: ISA (instruction set architecture) for
CPU.
– Includes software and hardware interfaces with
timing.
– May also include cost, speed, power, reliability
and maintainability specifications.
• Definition of system architecture.
– Example: high-level HDL (hardware description
language) representation - this is optional in
ECE 554, but is done in the real world).
– Useful for system validation and verification and
as a basis for lower level design execution and
3/15/2016 validation or verification.
46
Design Process Steps(continued)
• Refinement of system architecture
– In manual design, descent in hierarchy, designing
increasingly lower-level components
– In synthesized design, transformation of high-level
HDL to “synthesizable” register transfer level
(RTL) HDL
• Logic design or synthesis
– In manual or synthesized design, development of
logic design in terms of library components
– Result is logic level schematic or netlist
representation or combinations of both.
– Both manual design and synthesis typically
involve optimization of cost, area, or delay.
3/15/2016
47
Design Process Steps (Continued)
• Implementation
– Conversion of the logic design to physical
implementation
– Involves the processes of:
• Mapping of logic to physical elements,
• Placing of resulting physical elements,
• And routing of interconnections between the elements.
– In case of SRAM-based FPGAs, represented by
the programming bitstream which generates the
physical implementation in the form of CLBs,
IOBs, BRAMs, and the interconnections between
them
3/15/2016
48
Design Process Steps (continued)
• Validation – test and debug (used at several
steps in the process)
– At architecture level - functional simulation of HDL
– At RTL level - functional simulation of RTL HDL
– At logic design or synthesis - functional simulation
of gate-level circuit - not usually done, but
recommended in ECE 554
– At implementation - timing simulation of
schematic, netlist or HDL with implemention
based timing information (functional simulation
can also be useful here)
– At programmed FPGA level - in-circuit test of
function and timing
3/15/2016
49
Xilinx HDL/Core Design Flow
DESIGN ENTRY
RTL HDL EDITING
CORE GENERATION
RTL HDL-CORE
SIMULATION
SYNTHESIS
IMPLEMENTATION
3/15/2016
TIMING
SIMULATION
FPGA PROGRAMMING
& IN-CIRCUIT TEST
50
Xilinx HDL/Core Design Flow
- HDL Editing
DESIGN WIZARD
Accessed within
LANGUAGE ASSISTANT
ISE Foundation
HDL Module
Frameworks
Language Construct
Templates
HDL EDITOR
RTL HDL Files
3/15/2016
51
Xilinx HDL/Core Design Flow
- Core Generation
Select core and
specify input
parameters
CORE GENERATOR
EDIF netlist for
core_name
3/15/2016
HDL instantiation
module for
core_name
Other core_name files
52
Xilinx HDL/core Design Flow
- HDL Functional Simulation
HDL instantiation
module for
core_names
Set Up and Map
work Library
RTL HDL Files
Testbench HDL
Files
Compile HDL Files
EDIF netlists for
core_names
Test Inputs or
Force Files
MODELSIM
Functional Simulate
3/15/2016
Waveforms
or List Files
53
Xilinx HDL Design Flow
- Synthesis
All HDL Files
Edit FPGA Express
Synthesis Constraints
Synthesis/Implementation Constraints
Select Top Level
EDIF netlists for
core_names
Select Target Device
Xilinx ISE
Synthesize
Gate/Primitive Netlist
Files (EDIF or XNF)
3/15/2016
Synthesis Report
Files
54
Xilinx HDL/core Design Flow Implementation
Gate/Primitive Netlist
Files (XNF or EDN)
Netlist
Translation
Map
Model Extraction
XILINX ISE
Place &
Route
Timing Model Gen
HDL or EDIF for
Implemented Design
3/15/2016
Create
Bitstream
BIT File
Standard Delay
Format File
55
Xilinx HDL/core Design Flow
- Timing Simulation
HDL or EDIF for
Implemented Design
Standard Delay Format File
Set Up and Map
work Directory
Testbench HDL Files
Test Inputs,
Force Files
Compile HDL Files
Compiled HDL
MODELSIM
HDL Simulate
3/15/2016
Waveforms
or List Files
56
Xilinx HDL Design Flow
- Programming and In-circuit Verification
Bit File
Input Byte
GXSLOAD
GXSPORT
ECE 554
FPGA Board
Other Inputs
Outputs
3/15/2016
57
Design Practices
• Use synchronous design.
–
–
–
–
CLBs are actually reading functions from SRAM
Avoid clock gating.
Avoid ripple counters.
Avoid use of direct sets and resets except for
initialization.
– Synchronize asynchronous signals as needed.
• Test and debug each component design
– Rule of 10: it requires ten times more effort to
debug a design that has untested components in
it.
3/15/2016
58
What’s Next
• HDL/core design flow – design tutorial will
employ the flow described for a Verilog
HDL/core example
– During lab time on Tuesday
– https://cgi.cae.wisc.edu/~ece554/pmwiki/pmwiki.php?
n=Documentation.Tutorial
– Read over the tutorial before coming to lab
• Find a partner for the miniproject by next
Tuesday
• Start looking over the course website
– If you feel rusty with Verilog, take a look at lecture 2
3/15/2016
59
Tutorial Overview
•
Use the tools in the lab to design, simulate, and
implement a simple design
–
–
•
Use of embedded tool kit to help implement the miniproject
Multiply-accumulate unit
Main steps include
–
–
–
–
–
–
–
3/15/2016
Performing HDL coding for synthesis (Xilinx ISE)
Using cores (Xilinx Core Generator)
Behavioral simulation of synthesizable HDL code
(ModelSim)
Design synthesis (translation) (Xilinx ISE)
Design implementation (map, place & route) (Xilinx ISE)
Timing (post-Implementation) simulation (ModelSim)
Generating the FPGA programming file (Xilinx ISE)
60
Download