Section I Introduction to Programmable Logic Devices

advertisement
Section I
Introduction to
Programmable Logic Devices
Programmable Logic Device Families
Source: Dataquest
Logic
Standard
Logic
Programmable
Logic Devices
(PLDs)
SPLDs
(PALs)
ASIC
Gate
Arrays
Cell-Based
ICs
CPLDs
Full Custom
ICs
FPGAs
Acronyms
Common Resources
SPLD = Simple Prog. Logic Device
Configurable Logic Blocks (CLB)
– Memory Look-Up Table
PAL = Prog. Array of Logic
– AND-OR planes
CPLD = Complex PLD
– Simple gates
FPGA = Field Prog. Gate Array
Input / Output Blocks (IOB)
–
Bidirectional, latches, inverters, pullup/pulldowns
Interconnect or Routing
–
Local, internal feedback, and global
CPLDs and FPGAs
CPLD
Complex Programmable Logic Device
FPGA
Field-Programmable Gate Array
Architecture
PAL/22V10-like
More Combinational
Gate array-like
More Registers + RAM
Density
Low-to-medium
0.5-10K logic gates
Medium-to-high
1K to 500K system gates
Performance
Predictable timing
Up to 200 MHz today
Application dependent
Up to 135MHz today
Interconnect
“Crossbar”
Not shown: Simple PLD (SPLD) Architecture
Incremental
PLD Industry Growth
Programmable Logic vs.
Semi-Custom ASIC Market
Total 1996 Market – $9.5B
Total 2001 Market – $15.8B
Mask Programmed
Gate Arrays
$7.4B
Mask Programmed
Gate Arrays
$5.6B
47%
59%
20%
21%
Standard Logic
$2.0B
Programmable
Logic Share
$1.9B
Source: Dataquest, May 1997
16%
Standard Logic
$2.6B
37%
Programmable
Logic Share
$5.8B
Who is Xilinx?
• World’s leading innovator of complete
programmable logic solutions
Programmable
Logic Chips
Foundation and Alliance Series
Design Software
• Inventor of the Field Programmable Gate Array
• $600M Annual Revenues; 35+% annual growth
• Fabless* Semiconductor and Software Company
– UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
– Yamaha (Japan)
– Seiko Epson (Japan)
Xilinx vs. Competitors
$ Millions
1997 Calendar Year Revenues
700
600
500
400
300
200
100
c
gi
ck
Lo
Q
ui
At
m
el
re
ss
Cy
p
ce
nt
Lu
Ac
te
l
La
t ti
ce
Va
nt
is
x
Xi
lin
Al
te
ra
0
Source: Company reports & In-Stat.
Includes SPLD, CPLD, FPGA revenues.
19971997
Share Q4Q4
MarketShare
FPGAFPGA
Market
Others
5%
Lucent
10%
Altera
14%
Actel
16%
Source: In-Stat Research, March 1998
Altera number includes both 8K and 10K families
Xilinx
55%
Process & Density Leadership
Virtex
1 Million Gates
75
0.25u process
Transistor Count (millions)
XC40250XV ~500K gates
50
XC40150XV
25
XC40125XV - Industry’s 1st 0.25u PLD. ~250K gates, 5 LM.
7.5
4Q97
1Q98
2Q98
3Q98
4Q98
Xilinx Integrated Circuit Products
•
XC9500: Flash-based In System Program. CPLDs X
–
•
Lowest price, best pin locking, 600 - 7K gates
XC4000: Industry’s largest & fastest FPGAs
– XC4000E: 0.5, 5V, 5K - 40K gates
– XC4000EX: 0.5, 5V, 45K - 60K gates
– XC4000XL: 0.35, 3.3V devices,
5V compatible I/O, 3K - 180K gates
X
X
X
X
X
X
X
X
– XC4000XV: 0.25, 2.5V / 3.3V, 5V compatible
X
I/O, 250K - 500K gates
– Spartan: 0.5, 5V, Low Cost, 10K - 40K gates
•
Virtex: New FPGA architecture in 1998
–
•
X
X
X
XC6200: Reconfigurable Processing Unit
–
•
0.25, 5LM, 250K-1M gates, Select & Block-RAM
X
Dynamically and partially reconfigurable
X
Low-cost solutions (Industry)
–
XC3000 (no RAM), XC5200 (no RAM), HardWire
* Gates are in terms of system-level gates
Core Upper Research
Class Level
Class
XC9500
CPLDs
3
JTAG
Controller
JTAG Port
In-System
Programming Controller
Function
Block 1
I/O
I/O
I/O
I/O
Blocks
I/O
Global
Clocks
Global
Set/Reset
Global
Tri-States
Function
Block 2
3
FastCONNECT
Switch Matrix
•
•
•
•
Function
Block 3
•
Function
Block 4
•
1
2 or 4
5 volt in-system
programmable (ISP)
CPLDs
5 ns pin-to-pin
36 to 288
macrocells
(6400 gates)
Industry’s best pinlocking architecture
10,000
program/erase
cycles
Complete IEEE
1149.1 JTAG
capability
Xilinx XC4000 Architecture
Configurable
Logic Blocks (CLBs)
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
G4
G3
G2
G1
DIN
G
Func.
Gen.
SD
F'
H'
H
Func.
Gen.
F
Func.
Gen.
•
SRAM Based LUT for Synchronous
Dual Port RAM or Logic
•
ASIC-like array structure
•
Built-in Tri-States
•
Infinite reconfigurations, downloaded
from PC or workstation in ~1 second
Y
G'
H'
S/R
Control
Slew
Rate
Control
DIN
Passive
Pull-Up,
Pull-Down
Vcc
SD
F'
D
G'
D
Q
Q
Output
Buffer
H'
Input
Buffer
EC
RD
1
Q
F'
CLB
Programmable
Interconnect
CLB
Switch
Matrix
CLB
D
Delay
X
H'
K
High Density -> 1M System Gates
EC
RD
1
F4
F3
F2
F1
Q
D
G'
•
CLB
I/O Blocks
(IOBs)
Pad
XC6200 Reconfigurable Processing Unit
1000x improvement in reconfiguration
time from external memory
CPU
FastMAPtm assures
high speed direct
access to all internal
registers
Memory
XC6200
RPU
All registers accessed via
built-in low-skew
FastMAPtm busses
Ultrafast Partial
Reconfiguration
(40ns to 100’s of usec)
I/O
I/O
Microprocessor interface
built-in: “XC6200 is memory
mapped to look like SRAM to
a host processor”
High capacity distributed memory
permits allocation of chip
resources to logic or memory
- 256kbits in XC6264
Up to 100,000 gates
Exponential Growth in Density
Logic Cells
Logic Gates
1,000,000
12M
2 Million logic gates
100,000
1.2M
10,000
120K
1,000
1994
Year
1996
1998
12K
2000
2002
• Nov. 1997- shipping world’s largest
FPGA,
XC40125XV (10,982 logic cells,
250K System
LUT
Gates)
• 1 Logic cell = 4-input LUT + FF
• 175,000 Logic cells = 2.0 M logic gates in 2001
D
Q
FF
Design Flow
1
Design Entry in schematic, ABEL, VHDL,
and/or Verilog. Vendors include Synopsys,
Aldec (Xilinx Foundation), Mentor,
Cadence, Viewlogic, and 35 others.
2
Implementation includes Placement &
Routing and bitstream generation using
Xilinx’s M1 Technology. Also, analyze timing,
view layout, and more.
M1 Technology
Download directly to the Xilinx
hardware device(s) with
unlimited reconfigurations* !!
3
XC4000 XC4000
*XC9500 has 10,000 write/erase cycles
XC4000
Foundation Series
Delivers Value & Ease of Use
• Complete, ready-touse software solution
• Simple, easy-to-use
design environment
• Easy-to-learn
schematic, statediagram, ABEL, VHDL,
& Verilog design
• Synopsys
FPGA
Express
Integration*
The Xilinx Student Edition
• Prentice Hall’s most requested new engineering
product in Q1 ‘98 !
– Complete, affordable, and practical digital design course environment for
all students
– Predeveloped and tested lab-based course
• Includes
– Foundation Series 1.3 for students’ computers
– Practical Xilinx Designer lab tutorial book
– Coupon for XS40-005XL and XS95-108 boards ($129)
• Sold through bookstores by Prentice Hall and
www.Amazon.com, listed at $79 (ISBN 0136716296)
• Integrated tutorial projects cover:
TTL, Boolean Logic, State Machines, Memories, Flip Flops, Timing,
4-bit and 8-bit processors
• Upgradeable for free to F1.4 Express with VHDL &
Verilog, 40K gates, VHDL labs on the web
Section II
Basic PLD Architecture
Section II Agenda
• Basic PLD Architecture
– XC9500 and XC4000 Hardware
Architectures
– Foundation and Alliance Series Software
Section II
Basic PLD Architecture
XC9500 and XC4000
Hardware Architectures
XC9500
CPLDs
3
JTAG
Controller
JTAG Port
In-System
Programming Controller
Function
Block 1
I/O
I/O
I/O
I/O
Blocks
I/O
Global
Clocks
Global
Set/Reset
Global
Tri-States
Function
Block 2
3
FastCONNECT
Switch Matrix
•
•
•
•
Function
Block 3
•
Function
Block 4
•
1
2 or 4
5 volt in-system
programmable (ISP)
CPLDs
5 ns pin-to-pin
36 to 288
macrocells
(6400 gates)
Industry’s best pinlocking architecture
10,000
program/erase
cycles
Complete IEEE
1149.1 JTAG
capability
XC9500 - Architectural Features
• Uniform, all pins fast, PAL-like architecture
• FastCONNECT switch matrix provides 100%
routing with 100% utilization
• Flexible function block
–
–
–
–
–
36 inputs with 18 outputs
Expandable to 90 product terms per macrocell
Product term and global three-state enables
Product term and global clocks
Product term and global set/reset signals
• 3.3V/5V I/O operation
• Complete IEEE 1149.1 JTAG interface
XC9500 Function Block
Global
Clocks
AND
Array
3
Global
Tri-State
2 or 4
Macrocell 1
I/O
Macrocell 18
I/O
ProductTerm
Allocator
36
From
FastCONNECT
To
FastCONNECT
Each function block is like a 36V18 !
XC9500 Product Family
9536
9572
95108
95144
95216
95288
Macrocells
36
72
108
144
216
288
Usable
Gates
800
1600
2400
3200
4800
6400
tPD (ns)
5
7.5
7.5
7.5
10
10
Registers
36
72
108
144
216
288
Max I/O
34
72
108
133
166
192
PC84
TQ100
PQ100
PQ160
PQ100
PQ160
Packages
VQ44
PC44
PC44
PC84
TQ100
PQ100
PQ160
HQ208
BG352
HQ208
BG352
XC4000 Architecture
CLB
Slew
Rate
Control
CLB
Switch
Matrix
D
CLB
Input
Buffer
Programmable
Interconnect
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
DIN
G
Func.
Gen.
SD
F'
H'
EC
RD
1
F4
F3
F2
F1
H
Func.
Gen.
F
Func.
Gen.
Y
G'
H'
S/R
Control
DIN
SD
F'
D
G'
Q
H'
1
H'
K
Q
D
G'
F'
Vcc
Output
Buffer
CLB
Q
G4
G3
G2
G1
Q
Passive
Pull-Up,
Pull-Down
EC
RD
X
Configurable
Logic Blocks (CLBs)
D
Delay
I/O Blocks (IOBs)
Pad
XC4000E/X Configurable Logic
Blocks
• 2 Four-input function
generators (Look Up
Tables)
G4
- 16x1 RAM or
G3
Logic function
G2
G1
• 2 Registers
- Each can be
configured as Flip
F4
Flop or Latch
F3
- Independent
F2
F1
clock polarity
- Synchronous and
asynchronous
Set/Reset
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
DIN
F'
G'
G
Func.
Gen.
SD
EC
RD
1
G'
H'
Y
S/R
Control
DIN
F'
G'
SD
D
Q
XQ
H'
1
H'
K
YQ
H'
H
Func
.Gen.
F
Func.
Gen.
Q
D
F'
EC
RD
X
Look Up Tables
• Combinatorial Logic is stored in 16x1 SRAM Look Up
Tables (LUTs) in a CLB
Look Up Table
4-bit address
• Example:
Combinatorial Logic
A B C D
A
B
Z
C
D
 Capacity is limited by number of
inputs, not complexity
 Choose to use each function
generator as 4 input logic (LUT) or
as high speed sync.dual port
WE
RAM
G4
G3
G2
G1
G
Func.
Gen.
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
1
0
1
0
1
Z
0
0
0
1
1
1
. . .
1
1
1
1
1
1
1
1
0
0
1
1
0
1
0
1
0
0
0
1
4
(2 )
2
= 64K !
XC4000X I/O Block Diagram
Shaded areas are not included in XC4000E family.
Xilinx FPGA Routing
• 1) Fast Direct Interconnect - CLB to CLB
• 2) General Purpose Interconnect - Uses switch matrix
• 3) Long Lines
– Segmented across
chip
– Global clocks, lowest
skew
– 2 Tri-states per CLB
for busses
• Other routing types in
CPLDs and XC6200
CLB
Switch
Matrix
CLB
Switch
Matrix
CLB
CLB
Other FPGA Resources
• Tri-state buffers for busses (BUFT’s)
• Global clock & high speed buffers
(BUFG’s)
• Wide Decoders (DECODEx)
• Internal Oscillator (OSC4)
• Global Reset to all Flip-Flops, Latches
(STARTUP)
• CLB special resources
– Fast Carry logic built into CLBs
– Synchronous Dual Port RAM
– Boundary Scan
What’s Really In that Chip?
Programmable Interconnect Points, PIPs (White)
Switch
Matrix
Routed Wires (Blue)
Direct
Interconnect
(Green)
CLB
(Red)
Long Lines
(Purple)
XC4000XL Family
4005XL
4010XL
4013XL
4020XL
4028XL
Logic Cells
466
950
1,368
1,862
2,432
Typ Gate Range*
3 - 9K
7-20K
10-30K
13-40K
18-50K
Max. RAM bits
6K
13K
18K
25K
33K
I/O
Initial Packages
112
PC84
PQ100
PQ160
PQ208
160
PC84
PQ100
PQ160
PQ208
192
224
256
PQ160
PQ208
PQ240
BG256
PQ160
PQ208
PQ240
BG256
BG352
(Logic + Select-RAM)
(no Logic)
BG256
* 20-25% of CLBs as RAM
Logic Cells
Typ Gate Range*
HQ208
HQ240
BG352
4036XL
4044XL
4052XL
4062XL
4085XL
40125XV
3,078
22-65K
3,800
27-80K
4,598
33-100K
5,472
40-130K
7,448
55-180K
10,982
78-250K
42K
51K
62K
74K
100K
158K
288
HQ208
HQ240
BG352
BG432
PG411
320
352
384
448
544
HQ240
HQ240
HQ240
BG432
PG411
BG560
BG432
PG411
BG560
BG432
PG475
BG560
PG559
BG560
PG559
(Logic + Select-RAM)
Max. RAM bits
(no Logic)
I/O
Initial packages
* 25-30% of CLBs as RAM
HardWireTM
• Unique no-risk 100% compatible mask-programmed cost
reduction of Xilinx FPGA
• Cost-effective for volume applications
– Savings of 40% to 70%
• Architecture-equivalent mask-programmed version of any
FPGA
– Requires virtually no customer engineering resources, test
vectors, or simulation
– ALL FPGA features (e.g., Configuration, Power-On Reset, JTAG,
etc.) are fully supported
HARDWIRE
FPGA
HardWire Methodology vs.
Gate Array Conversion
Typical Gate Array Design Phases
T
e
s
t
D
e
v
e
l
o
p
m
e
n
t
Xilinx HardWire Methodology
Capture
Verification
Place and Route
I
t
e
r
a
t
i
o
n
s
Gate Array
Redesign Path
FPGA
Design
Verification
Physical Data Base
Prototypes
Xilinx
ATPG
Physical Data Base
.LCA File Conversion
Production Ready
Prototypes
Cost Reduction & Density Increases
1996
1997
1998
Cost
XC40250XV
XC4085XL (500K System-level 1M Gates*
Gates)
XC4036EX
5,000
0.4K
36,000
3K
85,000
7.5K
250,000
20K
Logic Gates
Logic Cells
* Starting with Virtex, Xilinx numbering scheme reflects approximate Logic + RAM gates rather than Logic gates only.
CPLD or FPGA?
•
•
•
•
•
•
CPLD
Non-volatile
JTAG Testing
Wide fan-in
Fast counters, state
machines
Combinational Logic
Small student
projects, lower level
courses
•
•
•
•
•
•
FPGA
SRAM reconfiguration
Excellent for computer
architecture, DSP,
registered designs
ASIC like design flow
Great for first year to
graduate work
More common in schools
PROM required for nonvolatile operation
Section II
Basic PLD Architecture
Foundation and
Alliance Series Software
Xilinx M1-Based Software
ALLIANCE Series
Software Backplane
Foundation Series
Libraries and Interfaces
for Leading EDA
Vendors
Core Implementation
Software - Map, Place,
Route, Bitstream
generation, and
analysis
Complete, Ready-to-Use
Includes Schematic,
Simulation, VHDL and
Verilog Synthesis
Graphical User Interface is very similar to XACTStep v.6.0
Design Tools
• Standard CAE entry and verification tools
• Xilinx Implementation software implements the
design
– The design is optimized for best performance and
minimal size
– Graphical User Interface and Command Line
Interface
Foundation
Functional
Simulation
–
Easy
access
to
other
Xilinx
programs
Design
or Alliance
– ManagesEntry
and tracks design revisionsSimulator
Back Annotation
Schematic,
State
Mach.,
HDL
Verification
– ~
Code, LogiBLOX, CORE Gen
M1 Design Manager
Xilinx
Design Implementation
Static Timing Analysis,
In-Circuit Testing
Multi-Source Integration
Schematic
Mixed-Level
Flows
HDL
Existing
Designs
Cores
 Enables multiple sources
and multiple EDA vendors
in the same flow
Design Source
Integration
Standards
Based
EDIF
VHDL
Verilog
SDF
Knowledge
Driven
Implementation
Check Point
Verification
 Allows team development
 Reduces design source
translations
 Design the way you are
used to
 Enables rapid, accurate
iterations
 Works well within existing
ASIC flows

Facilitates Design Reuse
3rd Party Support & Libraries
• Xilinx 3rd Party Design Entry & Simulation Support
– Synopsys, Cadence, Mentor Graphics, Aldec (Foundation)
– Viewlogic, Synplicity, OrCad, Model Technologies,
Synario, Exemplar and others supply libs & interfaces
– Industry standard file formats:
• VHDL, Verilog, and EDIF netlist formats
• SDF Standard Delay files
• VITAL library support
• Xilinx Libraries
– Optimized components for use in any Xilinx FPGA or
CPLD
– Wide range of functions
• Comparators, Arithmetic functions, memory
Libraries, Macros & Attributes
• Libraries are common design sets for all design entry tools
(eg. text, schematic, Foundation, Synopsys, Viewlogic, etc.)
– Unified Libraries:
• Boolean functions, TTL,
Flip-Flops, Adders, RAM,
small functions
– LogiBlox Libraries:
• Variable size blocks of
adders, registers, RAM,
ROM, etc.
• Properties defined as
attributes
• Library “interfaces” are specific to each front end
• Attributes are library element properties
Core Design Technology
Optimal Core Creation & Flexible Core
Delivery
Data sheets
Parameterizable Cores
CoreLINX:
Web Mechanism to
Download New Cores
SystemLINX:
Third Party System Tools Directly Linked With Core Generator
Foundation Series Express
Overview
• Easy to use, yet powerful
• Based on Industry Standards, not proprietary
languages
• Features:
–
–
–
–
–
Schematic (partnership with Aldec)
IEEE VHDL, Verilog, ABEL
State Diagram Editor
Interactive Simulation
Synopsys
Exclusive
partnership with Synopsys, the synthesis leader
Aldec
Xilinx
Foundation Project Manager
• Integrates all tools into one environment
Schematic Entry
ABEL and VHDL Text Entry
• From schematic menu
(or via HDL Editor),
select Hierarchy ->
New Symbol Wizard…
to create symbol.
• Select HDL Editor &
Language Assistant to
learn by example, then
define block.
• Synthesize to EDIF.
1
5
4
2
3
State Machine Graphical Editor
Graphical editor synthesizes into ABEL or VHDL code
Simulation - Easy to Use and
Learn
• Generate stimulus
easily and quickly
– Keyboard toggling
– Simple clock stimulus
– Custom formulas
• Easy debugging
– Waveform viewer
– Signals easily added and
removed
– Simulator access from
schematic
– Color-coded values on
schematic
• Script Editor
Foundation Express 1.4 Features
• Express Technology
–
–
–
–
–
–
–
Optimizes the design for Xilinx Architectures
Optimized arithmetic functions
Automatic Global Signal Mapping
Automatic I/O Pad Mapping
Resource Sharing
Hierarchy Control
Source Code Compatible With Synopsys Design
Compiler and FPGA Compiler
– Verilog (IEEE 1364) and VHDL (IEEE 1076-1987)
Support
– Easy, graphical constraint entry
– F1.4 is stand-alone
Xilinx-Express Design Flow
DSP COREGen
& LogiBLOX
Module Generator
XNF
.NGO
VHDL
Verilog
Behavioral Simulation Models
.VEI
.VHI
HDL Editor
VHDL
Verilog
State Diagram
Editor
.V
.VHD
Schematic
Capture
EDIF
XNF
Gate Level
Simulator
VHDL
Verilog
Timing
Requirements
Express
EDIF/XNF
.UCF
Reports
.XNF
Foundation Design Entry Tools
Xilinx Implementation Tools
Reports
EDIF
BIT
JDEC
SDF
VHDL
Verilog
H
D
L
S
I
M
U
L
A
T
I
O
N
Express Input and Output
• Input files may be VHDL or Verilog format
– Mixed Verilog/VHDL modules
are accepted
– Schematics may also be used, but
should not be input into Express
– Schematic files in XNF or EDIF
format will be merged into the
design in Xilinx Design Manager
• Output netlists are in XNF
format
• Timing Specifications may be
specified in Express
VHDL
Verilog
Timing
Requirements
Express
.XNF
Reports
Express Design Process
2
3
1
{
2
1. Analyze - Syntax check
2. Implement - Create generic logic design (Elaborate)
3. Enter constraints and options
4. Synthesize - Optimize the design for specific device
5. Export XNF Netlist
6. Implement layout with Xilinx Design Manager
4
Implementation - M1 Design Manager
• Manages
design data
• Access
reports
• Supports
CPLDs,
FPGAs
Flow Engine
Timing Analyzer
PROM File Formatter
Hardware Debugger
EPIC Design Editor
Terminology
• Project
– Source file; has a defined working directory and family
• Version
– A Xilinx netlist translation of the schematic
– Multiple Versions result from iterative schematic changes
• Revision
– An implementation of a Xilinx netlist
– Multiple revisions typically result from different options
• Part type
– Specified at translation; can be changed in a new revision
Toolbox Programs
• Flow Engine
– Controls start/stop points and
custom options
• Timing Analyzer
– Report on net and path delays
• PROM File Formatter
– Create file to program
configuration file into PROM
• Hardware Debugger
– Download configuration file with
XChecker, Serial or JTAG Cable
• EPIC Design Editor
– Device-level view of routing
Flow Engine
• View
status of
tools
• Control
tool
options
• Implement
s design to
the
bitstream
Section III
Advanced Hardware
Design Techniques
Section III Agenda
• Advanced Hardware Design Techniques
– General Hardware Information
– Combinational Logic Design (Look Up Tables and
other Resources)
– Synchronous Logic (Flip Flops and Latches
– Memory Design (RAM and ROM)
– Input / Output Design
Section III
Advanced Hardware Design Techniques
General Hardware Information
•
Resource
Estimation
Find comparable functions in
macro library and XAPP
application notes
– Or, use other designs to estimate
device utilization
• Or, quickly implement a design
and view the MAP report file
– Select Utilities -> Report Browser ->
Map Report
– IOBs, CLBs, Global Buffers, and other
components listed separately
• For unfinished designs
– Use save flags on unconnected nets,
or
– Deselect “Trim Unconnected Logic in
Implementation Options
S
MACRO
Performance Estimation
• Use block delays as estimate of net delays
• Use desired clock frequency to determine allowed CLB
depth
– Compare to functional requirements and modify design to meet
performance needs
• Example for 50 MHz clock frequency in XC4000XL-3:
Clock period
One level
Delay allowance
Each added level
Added levels of logic allowed
- 8 ns (tCO + tNET + tSU)
12 ns
% 6 ns (tPD + tNET)
2 CLBs
CLB
CLB
tCO
20 ns
tNET
tPD
CLB
tNET
tPD
CLB
tNET
tSU
Power Consumption
• Xilinx FPGAs have flexible
routing
– Power consumption can be
half that of FPGAs with less
flexible routing channels
• Power = kCV2F
– How many nodes change state (hard to estimate)
– Capacitive loading on CLB and IOB outputs (known)
• Power consumption is not a concern in regular course labs
• Power estimation methods
– See application notes under
http://www.xilinx.com/apps/3volt.htm
XC4000XL 3.3 V, 0.35,
5 V5 Volt Compatible
3.3 V
5 V Tolerant
Inputs
Any
5V
device
5V
3.3 V
XC4000XL
FPGA
0.35 
3.3 V Logic
3.3 V I/O
Meets TTL
Levels
•
•
•
•
Accepts 5Volt inputs
Drives standard TTL levels
Totally compatible in 5Volt environment
0.25 XV family is also 5 Volt TTL compatible when
used with 3.3Volt I/O supply, 2.5Volt core supply
XC4000XV & Virtex 2.5 V, 0.25,
5 Volt Compatible
• Devices with 5V, 3.3V, and 2.5V power supplies
can be interfaced
Section III
Advanced Hardware Design Techniques
Combinational Logic Design
(Look Up Tables and
Other Resources)
XC4000X Configurable Logic Blocks
• G, F, H function
generators
• 2 Flip-Flops
– Individual
clock polarity
– Sync. and async.
Set/Reset
• Delay from F1
to Y in the
XC4000X-1 is
~1 nsec
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
G4
G3
G2
G1
DIN
G
Func.
Gen.
SD
F'
G'
H
Func
.Gen.
F
Func.
Gen.
EC
RD
G'
H'
Y
S/R
Control
DIN
SD
F'
G'
D
Q
XQ
H'
1
H'
K
YQ
H'
1
F4
F3
F2
F1
Q
D
F'
EC
RD
X
Look Up Tables
• Combinatorial Logic is stored in 16x1 SRAM Look Up
Tables (LUTs) in a CLB
Look Up Table
4-bit address
• Example:
Combinatorial Logic
A B C D
A
B
Z
C
D
 Capacity is limited by number of
inputs, not complexity
 Choose to use each function
generator as 4 input logic (LUT) or
as high speed sync.dual port
WE
RAM
G4
G3
G2
G1
G
Func.
Gen.
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
1
0
1
0
1
Z
0
0
0
1
1
1
. . .
1
1
1
1
1
1
1
1
0
0
1
1
0
1
0
1
0
0
0
1
4
(2 )
2
= 64K !
16-bit Adder Examples
• Many choices for implementing an adder
– Speed vs. density trade-off controlled by user
and PLD features
Family
Type
CLBs
Levels
AppLINX
XC3000A
Bit-Serial
16
16
XAPP 022
XC3000A
Parallel
24
8
XAPP 022
XC3000A
Lookahead
30
6
XAPP 022
XC3000A
Conditional
41
3
XAPP 022
XC4000E-3
Carry
8
10.1ns
XAPP 018
XC5200-5
Carry
8
20ns
5200 DataSheet
Arithmetic Functions
• Arithmetic Macros are optimized for density and
speed with dedicated carry logic in CLBs
– Example: Each CLB can form a two-bit full-adder
• Carry Logic components have vertical orientation
– Needed for speed and utilization
– Known as RPM or “Relationally Placed Macro”
– Examples:
•
•
•
•
ADDx adders
ADSUx adder/subtractors
CCx counters
COMPMCx magnitude
comparators
A<3>
B<3>
A<2>
B<2>
A<1>
B<1>
A<0>
B<0>
Z<3>
ADD4
Z<2>
Z<1>
Z<0>
Three-State Buffers
• Each CLB is associated with two Three-State
buffers (BUFT)
– BUFTs are used independently of LUTs and Flip-Flops
• Three-State library components:
– Three-state buffers: BUFT, BUFT4, BUFT8, BUFT16
– Wired AND (open Drain) : WAND1, WAND4, WAND8, WAND16
– Two input OR driving Wired AND : WOR2AND
• Delay varies per family
– 3.7 ns in the XC4005XL (-1)
– 13.6 ns in the XC4085XL (-1)
Use BUFT for Buses
• Use to multiplex signals onto long
routing lines to use as buses
_ENABLE_A
_ENABLE_B
A3
B3
A2
B2
A1
B1
A0
B0
BUS<3>
BUS<2>
BUS<1>
BUS<0>
BUFT
BUFTs for Multiplexers
• BUFT can can be used to build large MUXes
– Large MUXes composed of LUTs need multiple levels of logic
– Large MUXes composed of BUFTs have only one level of logic
• CLB resources are not used
– Use of BUFTs constrains placement
• Multiplexer macros use lookup tables
– Example: M4_1E
• Create BUFT macros from Three-State buffer
components
– BUFT, BUFT4, BUFT8, BUFT16
Wide Decoders
• The Wide Decoder is a dedicated
wired-AND
– Useful for address decoding
• IOBs or CLBs can drive the
Wide Decoder
– Located along the periphery
of the die
– All IOB drivers must be on same edge as
decoder
– Four decoder lines per edge
• Use DECODE macro
– DECODE4/8/16/24
– Must use a PULLUP primitive
DECODE8
A0
A1
A2
A3
the A4
A5
A6
A7
PULLUP
O
CLB
Mapping
Control
in
Schematic
• Allows user to force mapping of logic from
schematic into a single CLB
• XC3000
– CLBMap can specify entire CLB
• XC4000/XC5000
– FMap specifies a function generator in a CLB
– HMap specifies an XC4000 H function generator in a
CLB
FMAP
A0
B0
C0
A2
B2
A0
B0
A2
B2
I1
I
2
I3
I4
O
C0
Section III
Advanced Hardware Design Techniques
Synchronous Logic
(Flip-Flops and Latches)
CLB Registers
• Each register
S/R
DIN
can be
F
G
configured as a
H
K
Flip-Flop or
(CLOCK)
Latch
• Independent
clock polarity
• Asynchronous
F
Set or Reset
G
H
• Clock Enable
• Direct input from
EC (CLOCK
CLB input
ENABLE)
(Connections
bypass LUTs)
S/R
Control
D
1
SET
Q
QX
EC
RESET
S/R
Control
D
1
SET
Q
EC
RESET
QY
Library offerings
• “Unified” library contains many standard functions
– Pre-defined size and functionality
• LogiBLOX templates are available
– Can be customized for bus size and function
• Types of LogiBLOX register functions
– Shift Registers
• Left/Right, Arithmetic, Logical, Circular
– Clock Dividers
• Output Duty Cycle
– Counters
• LFSR, Binary, One_Hot, Carry Logic
– Accumulators
• Xilinx CORE Generator recommended for very
complex functions (DSP, FFT, UARTs, Multipliers...)
Naming Conventions
FD PE _1
Flip-Flop
D-Type (D), JK-Type (JK), Toggle-Type (T)
Asynchronous Preset (P), Asynchronous Clear (C)
Synchronous Set (S), Synchronous Reset (R)
Clock Enable
Inverted Clock
LDCE_1
Transparent D Latch
Asynchronous Preset (P),
Asynchronous Clear (C)
Gate Enable
Inverted Gate
FD16 R E
Flip-Flop, D Type
Size
Synchronous Reset
Clock Enable
Counters
• Libraries support a wide variety of fast and efficient
counters
– Counters offer trade-offs between speed, density, and
complexity
– Example: LogiBlox counter styles
• Binary: predictable outputs, uses carry logic
• Johnson: fastest practical counter, but uses more flip-flops; glitch free
decoding
• LFSR: fast & dense, but pseudo-random outputs
• One-Hot: useful for generating series of enables
• Carry Chain: High speed and density
– The LogiBlox synthesizer will automatically pick the best
implementation based on your design, or you can force an
implementation with the STYLE parameter (schematic).
16 Bit Counter Examples
• The following are implemented in XC4000XL-3
Macro
CB16CLE/D
CC16CLED
CC16CLE
X-BLOX: LFSR
CLBs
18 - 20
19
9
9
Clock
23 - 24 ns
19 ns
16 ns
7 ns
• Simpler functions are faster and smaller
• Carry Logic Counters are generally faster (depends on size)
Global Clock Buffers
• Clock Buffers are low-skew, high drive buffers
–
–
–
–
Also known as Global Buffers
Drive low-skew, high-speed long line resources
Drive all Flip-Flops and Latches in FPGA
Can also be used for high-fanout signals
• Additional clocks and high fanout signals can be routed on
long lines
• Instantiation: if the BUFG component is instantiated,
software will select one of these buffers based on the design
• Synthesis: Clocks are identified by different means
depending on Vendor
– Example: Synopsys FPGA compiler connects clock buffers to all fan-in
of clock pins
• Control clock buffer insertion with separate commands
• Consult Synthesis interface guide or vendor
Global Buffer Types
Name
Buffer Description
Applications
BUFG
Global Clock
(Architecture independent)
BUFFCLK
Global Fast Clock
BUFGE
Global Low Early Clock
BUFGLS
Global Low Skew Clock
BUFGP
Primary Global Buffer
BUFGS
Secondary Global Buffer
M1 converts BUFG to
most appropriate
global buffer
Fastest way to bring
clock on chip
Faster than BUFGLS;
fast IO interface
Can access any CLB
or IOB, best for CLBs
Drives Clocks or
Longlines
Drives Clocks or
Longlines
Limitations
4 per chip, 4KX only;
slower for CLBs
8 per chip, 4KX only;
drives only 1 quadrant
8 per chip, 4KX only
4 per chip
4 per chip
BUFGLS is used by default in the Xilinx software if a
BUFG component is specified in the design
Generating Clock On-Chip
• Internal configuration clock available after
configuration
– Use OSC4 primitive
OSC4
F8M
F500k
F16k
F490
BUFGS
F15
– Nominal values (approximately):
• 8 MHz, (500 kHz, 16 kHz, 490 Hz, 15 Hz)
– Very limited accuracy (+/- 50%)
Global Reset
• All flip-flops are initialized during power up via
Global Set/Reset network
• You can access Global Set/Reset network by
instantiating the STARTUP primitive
– Assert GSR for global set or reset
– GSR is automatically connected to all CLB flip-flops
using dedicated routing resources
GR/GSR
– Saves general use routing resources for your design GTS
– DO NOT CONNECT GSR to set/reset inputs on FlipFlops
• Any signal can source the global set/reset, but
the source must be defined in the design
Q1
Q2
STARTUP
Q3
Q4
CLK
• Use Global Reset as much as possible
– Limit the number of flip-flops with an asynchronous reset
– Extra routing resources are used
DoneIn
Avoid Gated-Clock or Asynch. Reset
• Move gating from clock pin to prevent glitch from affecting logic.
Poor Design:
Binary Counter
CK
TC and Q may glitch during
the transition of Q<0:2>
from 011 to 100
DQ
Q0
Q1
Q2
TC
Improved Designs:
Binary Counter
CK
Q0
Q1
Q2
D Q
Carry-1
CE
D
CE
Q
TC
TC will not glitch during the
transition of Q<0:2> from
011 to 100
Or use MUXed data when
using only 1-2 logic inputs
Shift Registers are Fast & Dense
• The CLB can handle two bits of a shift
register
• Fast and dense independent of size
– Fast connections between adjacent lookup tables
Qi-1
D
Left/Right
Qi
EC
D
Qi+2
Q
Q
EC
Qi+1
Prescale Non-Loadable Counters
• Counter speed is determined by the carry
delay from LSB to MSB
• Non-loadable counters can use prescaling
– Pre-scaling restricts load timing
Fast
Small
Counter
TC
CE
Large Dense Counter
with Slower Carry
Use One-Hot Encoding for State
Machines
• Shift register is always fast and dense
– “One-hot” uses one flip-flop for each count
– Useful for state machine encoding in FPGAs
D Q
D Q
D Q
D Q
D Q
• Another alternative is a Johnson Counter
– Inverted output of last stage drives input of first stage
– Doubles the number of states versus one-hot
• Binary encoding is best for CPLDs
State Machine Design Tips
• Split complex states
• Need to minimize number of inputs, not
number of flip-flops, in FPGAs
– Use one-hot encoding for medium-size state
machines (~8-16 states)
• Complex states may be improved by
breaking up into additional simpler states
State
A
State
A1
State
A2
cond1
State
B
cond1
State
B
cond1
Use binary sequence only if necessary
• CLB can generate any sequence desired at same speed
• Use Pre-Scaling on non-loadable counters to increase speed
– LSBs toggle quickly
– See Application Notes
XAPP001 and XAPP014
Fast TC
Small
Counter
CE
Large Dense
Counter
with Slower
Carry
• Use Gray code counters if decoding outputs
– One bit changes per transition
• Consider Linear Feedback
Shift Register for speed when
terminal count is all that is needed
– Or when any regular sequence
is acceptable (e.g., FIFO)
Q0
10-bit SR
Q6
Q9
Pipeline for Speed
• Register-rich FPGAs encourage pipelining
• Pipelining improves speed
– Consider wherever latency is not an issue
– Use for terminal counts, carry lookahead, etc.
• How to estimate the clock period
– 2 x (number of combinatorial levels) x (speed grade)
– XC4000XL-3: 3 levels x 2 x 3ns = 18 ns clock period
Section III
Advanced Hardware Design Techniques
Memory Design
(RAM and ROM)
ROM is Equivalent to Logic
• When using ROM, it is simply defining logic
functions in a look-up table format
– Memory might be an easier way to define logic
– Xilinx provides ROM library cells
• FPGA lookup tables are essentially blocks of RAM
– Data is written during configuration
– Data is read after configuration
• Effectively operate as a ROM
As Gates
I1
I2
F1
F2
A0
O = I1*I2
X
O
A1
As ROM
DATA(0)=0
F1
DATA(1)=0 X
F2 DATA(2)=0
DATA(3)=1
DOUT
RAM Provides 16X the Storage of
Flip-Flops
• 32 bits versus 2 bits of storage
– Two 16x1 RAMS or One 32X1 Single Port Ram fit in one CLB
– One 16x1 Dual Port RAM fits in one CLB
CLB
D1
A0
A1
A2
A3
A4
32 bits
CLB
D1
DQ
Q1
D2
2 bits
DQ
Q2
O1
• 32x8 shift WE
register with RAM CLK
= 11 CLBs
– Using flip-flops, takes 128 CLBs for data alone
– Address decoders not included
RAM Types
• Synchronous RAM
(SYNC_RAM)
– Synchronous Write
Operation
• Synchronous DualPort (DP_RAM)
Data
Write Enable
Write Clock
Output
Address
Data
Write Enable
Write Clock
– Can read & write to
different addresses
Write Address/
simultaneously
Single-Port Read Address
SP
Output
Dual-Port Read Address
DP
Output
RAM Guidelines
• Less than 32 words is best
– 32x1 or 16x2 per RAM requires only one CLB
• Delays are short, (one level of logic)
– Data and output MUXes are required to expand depth
• Less than 256 words recommended per RAM
– Use external memory for 256 words or more
• Width easily expanded
– Connect the address lines to multiple blocks
• Recommendation: Use less than 1/2 of max memory
resources
– Maximum memory uses all logic resources of CLBs
Memory Use
• Most synthesis tools can synthesize ROM from
behavioral HDL code, but RAMS must be
O
instantiated
D
RAM32X1S
WE
A0
• Use library primitives
A1
A2
and macros for
A3
A4
standard size memory
– RAM/ROM16X1S to 32X8S
– Use S suffix for Synchronous RAM
– Use D suffix for Dual-Port RAM
• Use LogiBlox to generate arbitrary
size memories
How to Generate Memory
• Use LogiBlox utility to create arbitrary size
RAM or ROM
– Select type: ROM, Synchronous, Asynchronous, or Dual Port
RAM
– Specify Depth: number of words must be a multiple of 16,
ranging from 16 to 256 words
– Specify Width: word size ranges from 1 to 64 bits
– Specify initialization values with attribute file
• LogiBLOX also creates RAM interface
– Entity and component declaration - cut and paste into the design
(VHDL designs)
– Module declaration (Verilog designs)
– Symbol Graphic (schematic entry designs)
Memory Generator Dialog
Specify memory type, size, name and function in the LogiBLOX GUI
Instance Name
LogiBLOX function
example
Memory Function
Data file for
initialization
Section III
Advanced Hardware Design Techniques
Input / Output Design
XC4000X IOB Block Diagram
Shaded areas are not included in XC4000E family.
How to specify IO blocks - Schematic
• User explicitly defines what resources in the
IOB are to be used
• I/Os are defined with
– 1 pad primitive
– At least 1 function primitive:
• Buffer, F/F ,or Latch
• 1 input element, 1 output element or both
– Inverters may also be pulled into IOBs
• IOBs are named by net between pad and
function primitives
IOB IN1_PAD
IOB IN2_PAD
IPAD
IN2_PAD
ILD
IPAD
IN1_PAD
IBUF
Primary and Secondary Global
Buffers
• Eight global buffers
per FPGA
– Four primary (BUFGP), Four secondary (BUFGS)
• Primary buffers must be driven by a semidedicated IOB
• Secondary buffers can be driven by a semidedicated IOB or internal logic and have more
routing flexibility
– Use BUFGS if extra 1-2ns of delay is acceptable
• Use generic BUFG primitive in your design
– Allows software to choose best type of buffer
– Allows easy migration across families
D
IPAD
BUFG
I/O Logic
• 4000E families have no boolean logic other than
inverters in the IOBs
• XC4000EX adds optional output logic
– Can be used as a generic two-input function generator or MUX
– One input can be driven by IOB output clock signal
• Driving from FastCLK buffer provides less than 6 ns pin-to-pin delay
– Requires library components beginning with “O”
BUFFCLK
IPAD
OAND2
F
FROM INTERNAL LOGIC
OPAD
FAST
Use Pull-ups/Pull-downs to Prevent
Floating
• Unused IOBs:
– Outputs of unused IOBs are automatically disabled
– Pull-ups are automatically connected on unused IOBs
• Used IOBs:
– A PULLUP or PULLDOWN primitive can be connected to
used IOBs
– Inputs should not be left floating
• Add a pull-up to design inputs that may be left floating to
reduce power and noise
Output Three-State Control
• Output enable may be inverted
– Use OBUFE macro for active-high enable
– Use OBUFT primitive for active-low enable
OBUFE
OE
T
OBUFT
OE
• Three-state control also via a
dedicated global net
– Controlled by same
STARTUP primitive
T
STARTUP
• All I/O disabled during configuration
GTS
Fast Capture Latch
• Additional latch on input driven by output’s clock signal
• Allows capture of input by very fast clock
– Followed by standard I/O storage element for synchonization to
internal logic
– Very fast setup (6.8 NS for 4000EX-3), 0 ns hold
– Available on 4000X, not 4000E family
• Example
– ILDFFDX macro includes Fast Capture Latch and IFDX
– Connect BUFGE to fast capture latch
– Opposite edge of same clock via BUFGLS drives IFDX
ILDFFDX
Data
IPAD
BUFGE
D
GF
Clock
IPAD
BUFGLS
D Q
CE
to
internal
logic
Decrease Hold time with NODELAY
• NODELAY attribute
– Removes delay element to the IFD or ILD
– Decreases setup time, add creates hold time
– Available on IFD/ILD macros in XC5200 and
XC4000E/X families
IOB
External
Clock
Pad
External
Delay
Pad
Q D
Routing
Delay
Delay
Input
Buffer
Output MUX
• OMUX2
– Fast output signal (from output
clock pin) MUXes IOB output or
clock enable pins to pad
– Effectively doubles the number
of device outputs without
requiring a larger, more
expensive package
– Pin-to-pin delay is less than 6 ns
OMUX2
D0
O
D1
S0
OPAD
Slew Rate Control
• Slew rate controls output speed
• Two slew rates
– Default slow slew rate reduces noise
– Use fast slew rate wherever speed is important
– FAST Slew rates are approximately 2x faster than SLOW slew
rates
• Slew rate specification
– Instantiation: in the user constraint file:
• INST $1I87/obuf SLOW;
– Synthesis: vendor dependent
• Output drive varies by family
– 4KEX/XL families have 12 mA drive
FAST
OPAD
OBUF
Choose TTL or CMOS Thresholds
• Threshold is selected during configuration
• Default is TTL
– Global selection on inputs or outputs
– Change to CMOS in Configuration Template
– 3V devices need TTL threshold when interfacing to 5V devices
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Section IV Agenda
• Design Entry Tips
• Library Types
• FPGA Express for VHDL & Verilog
• M1-Based Software Flow
• Implementation Options
• Design Verification
• PLD Configuration Settings
• Design Constraints
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Design Entry Tips
Design Entry Tip - Label Nets
• Label as many nets as possible
– Net names are passed to report files
– Eases debugging
• Names may change due to hierarchy or optimization
• An IOB is named by the net between the pad and I/O
function primitives
• A CLB is named by the net on the output
– Flip-flops are always outputs
CLB Q2
IOB IN1
IN1
D Q
Q2
Use Legal and Readable Names
• Allowable characters
– Alphanumeric: A - Z, a - z, 0 - 9
– Underline _, Dash – Reserved characters
• Angle brackets for buses <>
• Slash / for hierarchy
• Dollar sign $ for reference designators
• Names must contain at least one non-digit
• Avoid using names that correspond to device
resources
– CLB row/column locations: AA, AB, etc.
– IOB pin locations: P1, P2, etc.
Component Naming Conventions
• Common component names, pin names and functions
for all families
• Basic format is <function><width><control_inputs>
– CB4CLE = Counter, Binary, 4 bits, Clear, Load, Enable
– FD16RE = Flip-flops, D-type, 16 bits, Reset, Enable
• Control inputs are referenced by a single letter
– C = asynchronous Clear, R = synchronous Reset
– Listed in order of precedence
Use Hierarchy in Design
•
•
•
•
Adds structure to design
Eases debug
Users can build libraries of common functions
Allows each design portion to be entered by
most efficient method
• Facilitates incremental design and
floorplanning
• Supports team design
Notes
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Library Types
Xilinx Libraries Overview
• Libraries contain descriptions of each
component with pin names, functionality,
timing, etc.
• There are two libraries:
– The Unified Library contains “ready made” components with
non-variable function and size
– The LogiBLOX Library contains templates which can be
customized for function and size
• Both libraries allow easy design migration
across Xilinx devices and families
LogiBLOX templates and GUI
• LogiBLOX is composed of two parts:
– LogiBLOX Library containing templates of VARIABLE SIZE
• Templates are expanded or customized (Counters, Adders, Registers,
RAM, ROM)
• Templates have many implementations (e.g. Binary, Johnson, LFSR
counters)
– LogiBLOX GUI and Synthesizer to create
•
•
•
•
A design file for implementation
Symbol for schematic capture tool
HDL code for instantiation in your design
Functional simulation model
Generic LogiBLOX Functions
• One generic model per function type(ex: counter) Attributes can be specified
– ex: bus width, load, clock enable, etc.
• Arithmetic: COUNTER,ADDER, SUBTRACTOR,
ACCUMULATOR
• Storage: SHIFT, DATA_REG, PROM, SRAM, DRAM
Logic: ANDBUS, ORBUS, MUXBUS, DECODE,
TRISTATE, COMPARATOR
• I/O: INPUTS, OUTPUTS, BIDIR_IO
• DSP and other complex functions are also available
through CORE Generator
LogiBLOX Module Selector
• Simple Combinatorial Logic
– Bus size from 2 to 32 bits
– Supports AND, Invert, NAND,
NOR, OR, XNOR, XOR
– Any of the inputs or output can be
inverted independently
• Use Decode or MASK function
• Three-State Drivers
– Bus size from 2 to 32 bits
– Optional pull-up resistors
• Constants
– Allows signals to be tied high or
low
How
to
use
LogiBLOX
in
HDL
code
• If a LogiBLOX function is inferred, there is
nothing more to do!
– Check with the synthesis vendor. Most synthesis
tools infer simple LogiBlox components
automatically
– Example: Synthesis tools will infer an adder for
X <= A +B;
• To instantiate a LogiBlox function, or if the
synthesis tool does not infer LogiBLOX
automatically
– Use LogiBLOX GUI from command-line in
“stand-alone” mode: %lbgui -vendor
* Creates a LogiBLOX module for simulation
* Creates an entity or module declaration
Section IV
Advanced Software Design
with Xilinx M1-Based Software
FPGA Express
for VHDL & Verilog Design
Section Agenda
•
•
•
•
Overview
Design Flow
Instantiation Guidelines
Coding Style Guidelines
Overview
• Xilinx leads in FPGAs - 55% market share
• Synopsys leads in VHDL/Verilog synthesis 80% market share
• One result of long term technology partnership is
FPGA Express
– Xilinx is only silicon supplier with right to distribute FPGA
Express technology
– Integration into Foundation Series
Express Input and Output
• Input files may be VHDL or Verilog format
– Mixed Verilog/VHDL modules are accepted
– Schematics may also be used, but
VHDL
Timing
Verilog
Requirements
should not be input into Express
– Schematic files in XNF or EDIF
Express
format will be merged into the
design in Xilinx Design Manager
Reports
• Output netlists are in XNF format
• Timing Specifications may be
specified in Express
.XNF
– Timing Specifications are not used during Synthesis
– Timing Specifications can be included in the output
Analyze the Design (1)
• “Analyze” checks the HDL code for syntax errors
– Also creates internal files
• Files are automatically
analyzed when
selected for a
project
• Do not select XNF
or EDIF files
– Will be merged
into the design by
Design Manager
Synthesis -> Identify Sources
Analyze the Design (2)
• As the design blocks are analyzed, status is
No Errors or Warnings
Out of Date
displayed:
Warnings
Errors
• In this example, all blocks were analyzed
successfully
Main Window
Implement the Design
• Express Implementation maps the HDL code
to standard logic, creating a generic netlist.
• At this stage, the design has not been
optimized
• To implement a design, select only the top
level block, and
then select
the
Implement
icon
Main Window
Check for Errors and Warnings
• After implementation is complete, the chip symbol
plus status is displayed
• View errors,
warnings,
and messages
• Right click inside
window to save
information to
Constraint Entry
• Constraints are NOT applied to Synthesis
– Constraints are written to the output netlist (XNF) file for use
by Design Manager (Xilinx Implementation Tools)
• Timing constraints control path delay
• Specify paths with timing groups, or groups of
IO or sequential elements
– The INPUT Group includes all input ports at the top level of
the design
– The OUTPUT Group includes all output ports at the top
level of the design
– All flip-flops clocked by the same edge of a common clock
belong to a group
– To define constraints: select Synthesis -> Edit Constraints
forms
Define Clock Period
• Enter Period, Rise, and Fall Time
– Select Clock entry -> Define
Synthesis -> Edit Constraints -> Clocks
Synthesis -> Edit Constraints -> Clocks ->
Define
Define Global Synchronous Delays
• The clock period creates 3 types of global
constraints with the same default value:
(1) All input ports to sequential Elements
– Setup of flip-flop or latch is included
(2) Sequential Element to all output ports
– Flip-Flop Clock to Q delay is included
logic
(3) Sequential Element
to Sequential
Element
2
3
1
D Q
logic
D Q
logic
Clock period
Synthesis -> Edit Constraints -> Paths form
Define Individual Synchronous Delays
• Default delay from Clock specification is used in the
Paths form
• Individual, or path specific delays can be defined on the
Ports form
– Port delays over-write the global delays from the Paths form
• Input delay, shown here, arrives 20 ns before the rising
edge of the clock.
Synthesis -> Edit Constraints -> Ports
Define Key Port Features
• Global Buffer defines the type of Clock Distribution
network - Use BUFG for most applications(default)
• Resistance specifies use of pullup or pulldown
resistor on unused pads
– Reduces power consumption and noise
• Use IO Reg allows use of sequential elements within
IO Blocks to minimize Input or Output delay (default)
– Dependent on device type
• Pad Location is used to specify pin number of the IO
pad
Synthesis -> Edit Constraints -> Ports
Control the Hierarchy
• Eliminate (default) or save hierarchical
boundaries
• Flat designs yield best results because more
merging and sharing of boolean logic occurs
• However, small blocks are easier to debug
– Easier to match source HDL code to synthesized design
• Synthesis goals (Speed or Area) and Effort level
can be defined for each module
Synthesis -> Edit Constraints -> Modules (implemented design)
Optimize the Design
• Optimization minimizes the design for
speed or area
• Select the implementation, and then
select the Optimize icon
Main
Window
• After Optimization, check for errors and
warnings again
View Results
• Select File -> Project Report to
generate a report
• Report file contains:
–
–
–
–
–
Files and libraries used
Settings for Synthesis
Chip type and speed grade
Estimated Timing
Warning: Circuit timing
estimates tend to be
optimistic. Run timing
analysis after routing
for most accurate
timing analysis.
Report.txt file
Verify Results (1)
• After Optimization, open Synthesis -> Edit
Constraints to verify that correct constraints were
specified
• Results are based on estimated routing delays
Synthesis -> Edit Constraints -> Paths (for an optimized design)
Verify Results (2)
• Review size of the design
• Resource use is displayed for each hierarchical
block
– Resources used per hierarchical block
– Black Box instantiations cannot be analyzed by
Express
Synthesis -> Edit Constraints -> Modules (Optimized Design)
Export Netlist
• Create the output netlist for use with the
Xilinx Design Manager (Xilinx
Implementation Tools)
– Output File format is XNF
• Select the optimized design, then select
Synthesis -> Export Netlist to create the file
– XNF file format
is used
• Enable Export
Timing
Specifications
to include
constraints
Synthesis -> Export Netlist
Simulation
• Not covered in this workshop
• Free VHDL / Verilog simulators
– See http://www.xilinx.com/xup/express/express1.htm
–
–
–
–
–
Active VHDL Simulator, by Aldec (Most Recommended)
VHDL Tools from RASSP
Accolade Design Automation demo VHDL Simulator
SimuCAD Silos III (Recommended for Verilog)
Wellspring Verilog Simulator
• Model Technology Inc. (MTI) and major
CAD vendors sell other HDL simulators
Instantiation and Hierarchy
• Hierarchy is created when one design is instantiated
into another design
• All components in the Unified and LogiBLOX
Libraries may be instantiated
– Unified library components are described in the Libraries
Guide
– LogiBLOX components are described in the LogiBLOX
Reference/User Guide
• Cells that must be instantiated with Express
Synthesis
RAM/ROM
Bscan
Readback
WOR
OSC
WAND
Black Box Instantiation
• What is a black box? Any element not analyzed by
Express. Examples:
– Existing Design Modules or Elements (XNF, EDIF, .ngo)
– LogiBLOX Components
– Pre Optimized Netlists (PCI Cores or LOGICOREs)
• Procedure for using a black box:
– Create a place holder in the HDL code
– Synthesize the design without the XNF, EDIF, or NGO files
– The Xilinx Implementation Tools will resolve (link in) all black
box references
• Limitations
– Express cannot check timing constraints through a black box.
– Express cannot include black box resources in it’s reports.
– GSR nets are not automatically inferred within Black Boxes
LogiBLOX & CORE Generator
Functions
• For HDL designs, LogiBLOX and CORE Gen
generate:
– Behavioral VHDL or Verilog model - for simulation only
– VHDL/Verilog Template - for component instantiation
– NGO file - for Xilinx implementation
• Most LogiBLOX functions can be inferred.
Exceptions include READBACK and RAM
blocks.
• Instantiation may provide better control of
design implementation
M1 - Introduction
152
How to Use LogiBLOX
1. Invoke LogiBLOX from
Foundation
2. Select Setup
a. Specify VHDL or Verilog
Template in the LogiBLOX Setup
form
b. Other setup options may also
be required*
3. Specify component features
4. Select OK to create component
5. VHDL/Verilog) Use template file
(.vhi / .vei) to easily instantiate
the component
Verilog - Add empty interface file
to define busses.
6. Compile as usual
*To access Verilog options, invoke LogiBLOX directly from Start -> Programs ->
Xilinx Foundation Series -> LogiBLOX
RAM Example
• Code is shown in the following slides:
• VHDL instantiation:
– Component and entity declarations where copied
into top level design file from LogiBLOX VHI file
• Verilog instantiation:
– Module declaration is copied into top level design
file from LogiBLOX VEI file
– Additional empty file is required to specify pin type
(input or output)
• Do not try to Analyze the VHD or VEI file
from LogiBLOX, but DO Analyze the top level
design file
– Verilog users will synthesize the additional empty
RAM Instantiation (VHDL)
Library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.STD_LOGIC_UNSIGNED.all;
entity top is
port (NOTCLR, CLKEN, NOTLD, UPCNT: in
STD_LOGIC;
CNT_DI, RAM_DI: in STD_LOGIC_VECTOR (7
downto 0);
QO_LO: out STD_LOGIC_VECTOR (7 downto 0));
end top;
. . .
component ram256x8
PORT(
A: IN std_logic_vector(7 DOWNTO 0);
DI: IN std_logic_vector(7 DOWNTO 0);
Top level
entity and
RAM
Component
declaration
Copied
from VHI
file
RAM
Instantiation (VHDL) (2)
begin
U1: OSC4
port map (OSC_CK);
Last part of Top
architecure
U2: BUFG
port map (OSC_CK, CLK);
U3: CB8CLED
port map (CLK, NOTCLR, CLKEN,
NOTLD,
UPCNT, CNT_DI, ADDR);
xram : ram256x8 port map
(A => ADDR ,
DI => RAM_DI,
WR_EN => CLKEN,
WR_CLK => CLK ,
Component declaration
is copied from VHI file,
and instance name is
entered
Coding for Performance
• FPGAs require better coding styles and more
effective design methodologies
– Pipelining techniques allow FPGAs to reach gate array
system speeds
• Gate Arrays can tolerate poor coding styles and
design practices
– 66 MHz is easy for an Gate Array
• Designs coded for a Gate Array tend to perform 3x
slower when converted to an FPGA
– Not uncommon to see up to 30 layers of logic and 10-20
MHz FPGA designs
– 6-8 FPGA Logic Levels = 50 MHz
Case vs If-Then-Else (Verilog)
module mux (in0, in1, in2, in3, sel, mux_out);
input
in0, in1, in2, in3;
input
[1:0] sel;
output
mux_out;
reg
mux_out;
always @(in0 or in1 or in2 or in3 or sel)
case (sel)
2'b00:
2'b01:
2'b10:
default:
endcase
end
endmodule
in0
in1
begin
mux_out = in0;
mux_out = in1;
mux_out = in2;
mux_out = in3;
module p_encoder (in0, in1, in2, in3, sel, p_encoder_out);
input
in0, in1, in2, in3;
input
[1:0] sel;
output
p_encoder_out;
reg
p_encoder_out;
always @(in0 or in1 or in2 or in3 or sel) begin
if (sel == 2'b00)
p_encoder_out = in0;
else if (sel == 2'b01)
p_encoder_out = in1;
else if (sel == 2'b10)
p_encoder_out = in2;
else p_encoder_out = in3;
end
endmodule
mux_out
in2
in3
sel
in3
in2
in1
sel=10
p_encoder_out
in0
sel=01
sel=00
Reduce Logical Levels of Critical Path
(Verilog)
in0
module critical_bad (in0, in1, in2, in3, critical, out);
in1
input in0, in1, in2, in3, critical;
critical
output out;
in2
assign out = (((in0&in1) & ~critical) | ~in2) & ~in3;
in3
endmodule
module critical_good (in0, in1, in2, in3, critical, out);
input in0, in1, in2, in3, critical;
output out;
out
in0
in1
in2
assign out = ((in0&in1) | ~in2) & ~in3 & ~critical;
in3
endmodule
critical
out
Resource Sharing (Verilog)
module poor_resource_sharing (a0, a1, b0, b1, sel, sum);
input
a0, a1, b0, b1, sel;
output
sum;
reg
sum;
always @(a0 or a1 or b0 or b1 or sel) begin
if (sel)
sum = a1 + b1;
else
sum = a0 + b0;
end
endmodule
module good_resource_sharing (a0, a1, b0, b1, sel, sum);
input
a0, a1, b0, b1, sel;
output
sum;
reg
sum;
reg
a_temp, b_temp;
always @(a0 or a1 or b0 or b1 or sel) begin
if (sel) begin
a_temp = a1;
b_temp = b1;
end
else begin
a_temp = a0;
b_temp = b0;
end
sum = a_temp + b_temp;
end
endmodule
a0
b0
+
a1
b1
+
sum
sel
a0
a1
+
sel
b0
b1
sum
Register Duplication to Reduce Fan-Out
(Verilog)
module high_fanout(in, en, clk, out);
input
[23:0]in;
input
en, clk;
output
[23:0] out;
reg
[23:0] out;
reg
tri_en;
always @(posedge clk) tri_en = en;
always @(tri_en or in) begin
if (tri_en) out = in;
else out = 24'bZ;
end
endmodule
module low_fanout(in, en, clk, out);
input
[23:0] in;
input
en, clk;
output
[23:0] out;
reg
[23:0] out;
reg
tri_en1, tri_en2;
always @(posedge clk) begin
tri_en1 = en; tri_en2 = en;
end
always @(tri_en1 or in)begin
if (tri_en1) out[23:12] = in[23:12];
else out[23:12] = 12'bZ;
end
always @(tri_en2 or in) begin
if (tri_en2) out[11:0] = in[11:0];
else out[11:0] = 12'bZ;
end
endmodule
tri_en
en
clk
[23:0]in
[23:0]out
24 loads
tri_en1
en
clk
12 loads
[23:0]in
en
clk
tri_en2
[23:0]out
12 loads
Design Partition - Reg at Boundary
(Verilog)
module reg_in_module(a0, a1, clk, sum);
input
a0, a1, clk;
a0
output
sum;
reg
sum;
reg
a0_temp, a1_temp;
clk
always @(posedge clk) begin
a0_temp = a0;
a1_temp = a1;
a1
end
always @(a0_temp or a1_temp) begin
sum = a0_temp + a1_temp; clk
end
endmodule
module reg_at_boundary (a0, a1, clk, sum);
input
a0, a1, clk;
output
sum;
reg
sum;
always @(posedge clk) begin
sum = a0 + a1;
end
endmodule
sum
+
a0
+
a1
sum
clk
Managing FPGA Speed Booster
Pipeline (Verilog) 1 cycle
module no_pipeline (a, b, c, clk, out);
input
a, b, c, clk;
output
out;
reg
out;
reg
a_temp, b_temp, c_temp;
always @(posedge clk) begin
out = (a_temp * b_temp) + c_temp;
a_temp = a; b_temp = b; c_temp = c;
end
endmodule
module pipeline (a, b, c, clk, out);
input
a, b, c, clk;
output
out;
reg
out;
a
reg
a_temp, b_temp, c_temp, mult_temp;
always @(posedge clk) begin
mult_temp = a_temp * b_temp;
b
a_temp = a; b_temp = b;
end
always @(posedge clk) begin
c
out = mult_temp + c_temp;
c_temp = c;
end
endmodule
a
*
b
+
out
c
2 cycle
*
+
out
When to Use Tri-state Buffers (BUFTs)
• BUFTs can be used to implement:
– Internal Tri-state busses
– Muxes greater than 4-to-1 or Multiplexed Buses
• BUFTs can be inferred:
– Tri-states are inferred when a ‘Z’ can be assigned to
a signal
• BUFTs can be instantiated:
– BUFT components
– LogiBLOX Tri-State Buffers
– Within a wide MUX: LogiBLOX Wired-AND MUX
M1 - Introduction
165
4-to-1 Tri-State MUX Before
(VHDL)
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
entity TST is
port( DATA: in
std_logic_vector(3 downto 0);
SEL: in integer;
SIG: out std_logic );
end TST;
SEL(0)
DATA(0)
SEL(1)
DATA(1)
SEL(2)
DATA(2)
SIG
SEL(3)
DATA(3)
architecture BEH of TST is
begin
LOOP1: for I in 0 to 3 generate
SIG <= DATA(I) when (SEL = I) else 'Z';
end generate ;
end BEH;
• Is there a problem with this example?
M1 - Introduction
166
4-to-1 Tri-State MUX After (VHDL)
CLBs
IOBs
TBUFs
Before
8
37
4
After
4
7
4
• How can this code be improved?
– Default integer is 32 bits
– Define a limit
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
entity TST is
port( DATA: in
std_logic_vector(3 downto 0);
SELECTOR: in integer range 0 to 3;
SELECTION: out std_logic );
end TST;
. . .
M1 - Introduction
167
Flip-Flop Examples (VHDL)
• Flip-Flop inference driven by ‘event in
VHDL
-- D flip-flop
FF: process (CLOCK) begin
if (CLOCK'event and CLOCK='1') then
A_Q_OUT <= D_IN;
end if;
end process; -- End FF
Produces
registered
output
Generates
async preset
-- Flip-flop with asynchronous preset and clock enable
Generates
FF_CLOCK_ENABLE: process (ENABLE, PRESET, CLOCK)
clock enable
begin
if (PRESET = '1') then
D_Q_OUT <= " 11111111";
M1 - Introduction
168
elsif (CLOCK'event and
CLOCK='1') then
if (ENABLE='1') then
Flip-Flops Vs. Latches
• Latches inference does not include an edge
(‘event or posedge)
• Latches are generated when:
– A signal is assigned in one branch of an if
statement or case statement, but not all branches
– An if or case statement does not define all
possible conditions
• Does not apply to case statements in VHDL
• Use Synopsys parallel_case and full_case
directives for Verilog to avoid latches
• Or, include a default clause before the if statement
Global SET/RESET
• All Xilinx FPGAs have a built-in global
synchronous reset facility
• Global SET/RESET sets or resets every
sequential element in the FPGA
– GSR signal is accessed by instantiating the
STARTUP block.
– GSR will be inferred when the design has a net
that sets / resets all sequential elements in the
design
– Additionally, sequential elements may be set or
reset individually
• These global nets
exist outside of the general170
M1 - Introduction
purpose routing within the device.
How to access Global
SET/RESET
• The Global Set/Reset (GSR) signal is
accessed by instantiating the STARTUP
block.
– Polarity may be inferred
• GSR will be inferred when the design
has a net that sets / resets all sequential
elements in the design
M1 - Introduction
171
State Machine Encoding
• For FPGAs, use of one-hot encoding for
complex state machines
– Works well in Xilinx’ register-rich FPGAs
– Uses fewer wide-input functions
– Generally produces fast state machines
• For CPLDs, use Binary encoding
• One-hot and binary encoding can be selected
in Express at Synthesis -> Options -> Project
– Other types of encoding such as BCD or Gray
may be specified in the HDL code
• Its best to breakM1up- Introduction
large state machines into
smaller ones
172
Address Range Identification
• For the inequality operators, synthesis will infer
two 12-bit comparators
• VHDL Example:
if ADDRESS(31 downto 20) <= “000000000110”
and
ADDRESS(31 downto 20) >= “000000000001”
then
• More address ranges are synthesized to more
comparators
• Better solution: look for patterns in address bits
that can eliminate need for comparators
if (ADDRESS(31 M1
downto
23) = “000000000”) and
- Introduction
173
(ADDRESS(22 downto 20) /= “111”) and
Arithmetic and Comparison Operators
• Use arithmetic and comparison operators
whenever possible. Example:
if (Y > Z) then X <= A + B;
• Arithmetic and comparaison operators give
Express the most flexibility to optimize
– Multiplier
– Adder, Subtracter, and Adder/Subtracter
– Incrementer, decrementer, and
incrementer/decrementer
– Comparater
– Mutiplexer (select operator)
• Operators can be instantiated, but generally you
will get the best performance with operator
Expressions
• Expressions
– Use parentheses to indicate precedence.
– Replace repetitive expressions with function
calls or continuous assignments
Last but not least….
• Expressions
– Use parentheses to indicate precedence.
– Replace repetitive expressions with function calls
or continuous assignments
• VHDL generate statements can cause long
compile times unfolding the logic - Use
wisely
– Be careful with generate statements nested in
loops or within generate statements
– Generate example
-- Generate 3 instances of ALU2
GEN1: for N in 0 to 2 generate
ALU2_X3: ALU2 port map ( CTL(2+ N*3 downto N*3),
Resources
• Support Resources
– www.xilinx.com ( Answers Search)
– Express Expert Journal
http://www.xilinx.com/support/techsup/journals/fpga_exp
/index.htm
– Synthesis Design Guides
http://www.xilinx.com/apps/hdl.htm
• On-Line Documentation
START -> Programs -> Xilinx Foundation Series ->
VHDL Reference Manual
START -> Programs -> Xilinx Foundation Series ->
Verilog Reference Manual
START -> Programs -> Xilinx Foundation Series ->
Section IV
Advanced Software Design
with Xilinx M1-Based Software
M1-Based Software Flow
Logical Design Files
• Logical Design Files describe your design, and are
composed of logical components
– Typically a netlist, generated by Schematic Capture or Synthesis
– Composed of Boolean Gates, FIFOs, RAMs
• Netlist input to XACT-Step M1 is in EDIF format
– XNF files are also accepted
• EDIF format files are translated to (Native Generic
Design) NGD format
– NGD files have varying extensions
— Ex: NGD, NGM, NGA, NGO
• NGD files can be translated to other formats for
simulation
Physical Design Files
• Physical design files are composed of components
found in a Xilinx FPGA such as look-up tables and
flip-flops
– Physical design files have .ncd extension
– Map creates an NCD file from an NGD file
– NCD files contain varying pieces of information
• Mapping, placement, and routing tools each concatenate data
to the bottom of the NCD file
.XNF or EDIF netlist
UCF
User Constraint File
NGDBUILD
Flatten Hierarchical Design
.NGD
MAP
Logical to Physical translation
Groups LUTs and FFs Into CLBs
.NCD
M1-Based
Design Flow
*Design entry tool flows to M1
are shown in the Appendix.
.PCF
TRCE
Static Timing Estimates
TRCE
Static Timing Analysis
PAR
BITGEN
Layout of Physical Design
Routes Physical Design
Generates configuration file
.NCD
.BIT
Design Flow Programs (1)
• NGDBUILD
– Merges hierarchical EDIF or XNF files into one hierarchical file
– Creates internal netlist .ngd(Native Generic Design) files
– Contains logical components: combinatorial gates, RAMS, flip-flops,
etc.
• MAP
– Maps logical components to physical components found in Xilinx
FPGA: look up tables, Flip-Flops, three state buffers, etc.
– Packs physical components into COMPS
– Creates internal .ncd (Native Circuit Design) file
Translate
Map
Place & Route
Configure
Design Flow Programs (2)
• TRCE
– Analyzes Timing
• Use before PAR to analyze constraints
• PAR
– Places COMPS on FPGA
– Routes the FPGA
• TRCE
– Analyzes Timing
• Use after PAR to check delays
• NGDANNO
– Back-annotate timing delays for Simulation
• BITGEN
– Create file to configure FPGA
•
Key
M1
Browser
Reports
Map Report
–
–
–
–
Displays result of DRC (Design Rule Check)
Indicates if the design will fit into the specified part
Identifies ways to improve the design
Reports nets with no source or load
• Logic Level Timing Report provides delay estimates
– Reports longest paths in the design
– Created before placement
– Based on block delays and minimum net delays
Key Report Files
• Placement and Routing Report includes resource
summary
–
–
–
–
Indicates the percentage of utilization
The number of I/O and flip-flops is specified
Reports if the design routed
Gives an overall timing score
• Score of zero indicates all timing specifications were met
• Post Layout Timing Report
– Based on block delays and net delays after routing
– Used for detailed delay analysis after implementation
• Pad report
– Cross reference of Input/Output components and package pins
•
BEL and Comp Terminology
XACTstep M1 uses two new terms for FPGA resources: “Comps”
and “Bels”
– A comp may refer to a CLB, IOB, TBUF, or Decoder
– A BEL may refer to the contents of a comp, such as F-LUT, H-LUT,
FFX, FFY, RAM, or PAD
•
The Graphic Design Editor (EPIC), and TRCE timing reports will
refer to BELS
FFX
G_LUT
H_LUT
F_LUT
FFY
4000X CLB
 The COMP
shown here is a
CLB, which
contains BELS:
F_LUT, G_LUT,
H_LUT, FFX, and
FFY
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Implementation Options
Main Implementation Menu Options
• Guide Option
–Use a previous
implementation as
template for current
implementation
–Specify constraint file
(optional)
• MAP, PAR, and
configuration options
–Implementation has
four sub-menus:
Optimize and Map,
Place and Route,
Timing, and Interface
Optimization
and
Map
Options
(1)
 Map optimizes your design before it is partitioned into LUTs,
Flip-Flops,etc. The GUI includes these options:
• Trim Unconnected Signals
(default is On)
– Trims all fan-out/fan-in
from unconnected pins
– Turn off to implement
hierarchical blocks
separately
• Replicate Logic (default is
on)
– Duplicates logic with high
fan-out
– Increases utilization,
decreases delay
Optimization and Map Options (2)
• Optimization Strategy (default is Off)
– Minimizes logic to optimize logic for speed, area, or both
– Synthesized designs have been optimized already
• Packing Strategy (default is minimum density)
– Informs Map of how to pack COMPS with logic
– Minimum Density - Map only puts related logic into the same
COMP
– Fit Device - packs components more tightly into COMPS
– Can adversely affect timing and routability
• Generate 5-I/P Functions
– Reduces block levels but increases area
Place and Route Options (1)
• Runtime (default is 2)
– Trades off placement effort
verses CPU time
• Router Passes (default is
Auto)
–The Router will run until no
improvement is made to meet
timing constraints.
– Specify a number to avoid
very long run times for difficult
designs.
– Start with 3 passes
Utilities -> Template Manager -> Edit
Implementation Template -> Place and
Route
Place and Route Options (2)
• Workstation users may run PAR LOOP on multiple
workstations simultaneously
– Create a list of available workstations
• One name per line, no comments
– Include the file name in the Nodelist field
• Many other options for advanced users, not shown
here
Implementation Options for Fast
Runtime versus PAR Effort
Select fast placement option,
1-2 routing passes, 0 clean-up
passes, and deselect
“Use Timing Constraints”
1
Deselect these 3 checkboxes
Other hints:
- 4KX and 9500 families give fastest runtimes.
- Save this as an implementation template
Timing Report Options
• Enable the creation of the
Timing Report
– Logic Level Timing Report is
created before PAR
• Has minimal net delays
• Used to predict realistic
constraints
– Post Layout Timing Report is
created after PAR
• Verify that the design meets
constraints
Timing Report Options (2)
• These options limit the information placed in the report file
• All options list paths in order of delay length; longest paths are
listed first
Design Performance Summary (Default)
– Displays longest clock-to-setup, pad-tosetup, and setup-to-pad delays for each
clock in the design
Default Timing Constraints
– Lists longest Flip-Flop-to-Flip-Flop, Padto-Flip-Flop, and Flip-Flop-to-Pad paths
User Timing Constraints
– Report longest paths for each constraint
Design -> Implement -> Options ->
Edit Template -> Timing
Controlling the Back Annotation
Netlist Format
Format options:
- VHDL
- Verilog
- XNF
- EDIF
EDIF formats:
- Standard (2.0.0)
- Viewlogic
- Mentor EDIF
- LogicModelling
How to Start and Stop the Flow
Engine
• Select Flow Engine -> Setup Advanced to
select the starting state
• Select Flow Engine -> Setup -> Stop After
to set stopping point
Create a Script from the GUI
• M1 can create a script file from the GUI session
– Available from the Flow Engine or Design Manager
– Select Utilities -> Command History -> Command Line
– Select Utilities -> Project Notes
• Copy, paste, and save text from Command History Window
The Guide Option
• Allows use of a previously placed and routed
design to guide a new placement
– Can be useful if there are few design changes
• Guide is used for Map, Place, and Route
– Map may take much longer to execute, but PAR will be
faster
• Recommended alternative is to use location
constraints in design
Previous
Design
New Design
Place & Route
Guide
Effective use of Guide
• Guide uses signal and component names to
determine edited parts of the design
• Name all nets
– Do not change names
• Minimize changes to the design
– Any new hierarchy changes all names below
– Avoid any changes to synthesized logic
• Synthesis users: please try to freeze the design
with “set_don’t_touch” or like command
– Otherwise, guide option may not be useful
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Design Verification
Recommended Verification Flow
Netlist
FUNCTIONAL SIMULATION
Implement
TIMING SIMULATION
Timing Analysis
Bitgen
Prom File Formatter
Download
IN-CIRCUIT VERIFICATION
Timing Analyzer
• Analyze delays before and after implementation
Timing Analyzer Benefits
• Combines block delays from data book with net
delays from implementation files
• Quickly identifies critical paths and timing
hazards
• Report shows all elements in path, each
element's delay, and cumulative delay
– Can determine if slow paths are due to block delays
(design) or net delays (implementation)
IOB
I F1
CLB1
X F3
DQ
CLB2
block net block net block
Element
PAD to IOB.I
IOB.I to CLB1.F1
CLB1.F1 to CLB1.X
CLB1.X to CLB2.F3
CLB2.F3 to Clock
Delay
2.2
1.1
2.7
1.2
2.1
Total
2.2 block
3.3 net
6.0 block
7.2 net
9.3 block
Output files for Simulation
ngdanno
ngd2xxx
VHDL
/ SDF
Verilog
/ SDF
EDIF
XNF
• Before implementation, output netlist has unit delays, no
back-annotation (use for functional simulation)
• After implementation, post-route delays are backannotated
– EDIF or XNF output files include back-annotated delays
– SDF files are created in addition to Verilog & VHDL netlists
• VHDL and Verilog output netlists do not contain delays
M1 HDL Simulation Flow
VHDL & Verilog Simulation Libraries
• UNISIM
– New for A1.4 allowing RTL and post-synthesis simulation
•
SIMPRIM
– Family/architecture independent models
– Used for Post-M1 simulation including full timing
– VHDL and Verilog
• Standard Delay Format (SDF) files
– Separate file used to specify design timing (delays) to VHDL
and Verilog simulators
– Xilinx software version 1.4 supports SDF version 2.1
Hardware Configuration Readback
•
•
•
•
Can occur while FPGA runs
Requires XChecker cable
Readback Trigger input starts serial readback
XC3000 controlled via Bitstream Generator
– Default is enabled
– Data and trigger connected to Mode pins
• XC4/5000 controlled via schematic and Bitstream
Generator
– Include Readback symbol in schematic
• Connect TRIG and DATA to I/O pins
• Can use MD0 and MD1
• See Appendix for more information
CLK
XChecker
RT
IPAD
(MD0)
TRIG
IBUF
DATA
READBACK RIP
OPAD
OBUF
(MD1)
XChecker
RD
Section IV
Advanced Software Design
with Xilinx M1-Based Software
PLD Configuration Settings
Bitstream Generator Options - Configuration
• Controlled via
Configuration Template
• Increase Configuration
Rate if not concerned
about compatibility with
earlier families
• Add Pull-Up or PullDown to avoid having
to connect external
resistors
• All configuration
controls are set in
template.
Bitstream Generator Options Startup
 The “Start-Up Clock” switch
enables the designer to
synchronize startup with the
FPGAs’ own configuration clock
or an external clock signal.
 Start-up can also begin when
the “Done” pin goes high.
 To program the “Output Events”
refer to the Implementation
Options of the “Design Manager
User Guide included with the
Documentation CD.
Bitstream Generator Options - Readback
 The Hardware Debugger can
verify the downloaded
configuration and probe the
internal states of the device by
using the Readback feature.
 To use this feature you will
need to assert the “Enable
Bitstream Verification” box,
connect the XChecker Cable to
your device, and insert the
“Readback” symbol into your
design.
 For more information, refer to
the Xilinx Data Book and the
Hardware Debugger Reference
Guide on the Documentation
CD.
Choose
a
Configuration
Method
Configuration Mode
Data
Characteristics
Master Parallel
Byte-Wide
Master Serial
Bit-Serial
Peripheral
Byte-Wide
Synchronous Peripheral
Byte-Wide
Express
Byte-Wide
Slave
Bit-Serial
Daisy Chain
Bit-Serial
FPGA loads itself from external
byte-wide PROM
FPGA loads itself from external
serial PROM
FPGA loaded under
microprocessor control
FPGA loaded by users’
configuration clock
Fastest configuration mode;
4000EX devices only
FPGA loaded by microprocessor
or DMA controller; used by
XChecker Download Cable
FPGAs load themselves from
PROM; PROM Formatter creates
bitstream
M[2:0] pins control configuration mode setting.
Section IV
Advanced Software Design
with Xilinx M1-Based Software
Design Constraints
Section Agenda
• Overview
• Location and implementation
constraints
• General timing constraints
• Specific timing constraints
–
–
–
–
Path and block specific constraints
Path and block grouping
Advanced constraint commands
Priority
Constraint Entry Overview
• All constraints can be entered in User Constraint
File (UCF)
–
–
–
–
Maximum allowable delay
Placement of package pins
Implementation Options
Bitstream Generation / Prom Configuration
• Timing constraints may also be defined in
schematic
– Advantage: Easy entry for hierarchical blocks
• UCF files must have hierarchical net and component names
– Disadvantage: Not all constraints are supported
– See Libraries guide for schematic syntax and
availability
UCF Syntax
• Use uppercase letters for keywords
– Keywords include names used in constraints, such as:
AFTER
OFFSET
PERIOD
BEFORE
NET
LOC
IN
OUT
• Use quotes around names with nonalphanumeric characters
• Two types of wildcards may be used:
– “?” is a wildcard for a single character
– “*” is a wildcard for any number of characters
Pin Location, Implementation
Constraints
• Pads can be assigned to a package pin
– Ex: Assign a bus signal to pin 32
INST “QOUT<3>” LOC = P32;
• Physical Implementation may be
controlled in the UCF file, such as:
– FAST: Set fast I/O slew rate
Example: INST “$1I87/OBUF” FAST;
– PART: Define part type to be used
Example: CONFIG PART=4005E-
Simple Combinatorial Path
• Consider the following path:
2 levels of logic
A
OUT2
B<9:0>
27 NS
• Assume system requirements dictate a delay of 27
ns for all input to output pins
• The TIMESPEC constraint communicates this
requirement to software:
TIMESPEC TS01 = FROM
27 NS;
PADS
TO
PADS
Synchronous I/O Constraints
• Timing requirements for the design are
described by defining system delays
• System delay include these questions:
– What is the clock period?
IC2 : FPGA Under
IC3
– WhenIC1
do inputs arrive atDevelopment
IC2?
– When must outputs be stable to meet setup at IC3?
CLOCK
Input Arrival Calculation
• Inputs are constrained by their input arrival.
• Example: When does data arrive at pin D1?
– After the clock trigger, data delay is
TCKO + Tnet + Tpad + TC1
– Delay C1 net delays, or other combinatorial
elements on the board
IC2: Device
– Delay TCD
the FPGA
clockunder
IC 1 is the delay through
Tc1
Development
Tpad
distribution network
D1
D
C1
Q
Tcko
C2
CK
Tnet
Tcd Tpad
CLK
0
Tarrival
D
50
Tarrival = Tcko + Tnet + Tpad + TC1
Q
Output Stability Calculation
• When does output data need to be stable?
– Data must be stable in order to meet the setup
requirement for IC3
– How long must the data be stable before data is
latched in IC3?
• Tstable = Tc3 + Tpad + Tnet + Tc4 + Tsetup
IC 3
IC2: Device under Development
Tc3
Tc4
Tpad network
• TCD is the delay through the clock distribution
C2
D
Q
O1
C3
C4
D
Q
Tsetup
CK
CK
Tnet
CLK
Tstable
0
50
Tstable = Tc3 + Tpad + Tnet + Tc4 + Tsetup
Period and Offset Constraints
• Two commands are used to describe synchronous
delays
– Period defines the clock
– Offset constraints define input arrival time and output
stability time relative to the clock
• Xilinx software determines internal FPGA delays
from Period and Offset constraints
• Syntax:
NET clock_name PERIOD = some_delay time_unit;
NET input_name OFFSET = IN Tarrival time AFTER
clock_name;
NET output_name OFFSET = OUT Tstable BEFORE
clock_name;
Clock Constraint Example
• Use the Period Command to define the
clock
• Given that the clock frequency is 20 MHz
for the example:
NET “CLK”
0
PERIOD = 50 ns;
50
100
Example waveform for CLK
Synchronous Constraint Example
• OFFSET defines the delay of a signal external to
the chip, relative to a clock. Internal clock delays
Determined by
Determined by
are Tarrival
determined
by
Software
Tstable
40ns
Software
Software
12ns
14ns
FF1
OUT1
FF2
NET “CLK” PERIOD = 40;
ADD0_IN
NET “ADD0_IN” OFFSET = IN 14 AFTER CLK;
NET “ADD0_OUT” OFFSET = OUT 12 BEFORE CLK;
CLK
0 14 20 28 40
Constraint Recommendations
• Use a given TIMESPEC name for only one
path
• Keep constraints in one source
– Either UCF file or in schematics, but not both
• Avoid OVER-constraining the design
– Design Performance suffers
• Critical timing paths get the best placement and fastest routing
options
• As the number of critical paths increases, routability decreases
– Run times increase
• More information in the On-Line Docs:
– Libraries Guide
– Development Systems Reference Guide, Using Timing
Constraints, UCF sections
Question
• Given the following:
Clock Frequency = 20 MHz
Tarrival = 31 ns = delay from CLK to Input pin D1 of IC2
Tstable = 27 ns = Delay (including setup) from O1 to D pin of
IC2: Device under
FF3 (IC3)
Development
IC 1
IC 3
D
Q
C1
D1
CK
C2
D
Q
O1
C3
C4
CK
D
Q
CK
CLK
Fill in the constraints below :
NET _____ PERIOD = _____ NS;
NET _____ OFFSET = IN _____ AFTER CLK;
NET _____ OFFSET = OUT _____ BEFORE CLK;
Answers :
CLK
D
O1
50
31 ns
27 ns
Path and Block Specific Constraints
• Why use path or block specific
constraints?
– To decrease speed requirements wherever
possible
– To Increase routability and overall speed of
the design
– To decrease software run-time
• General Methodology
– Use PERIOD and OFFSET to constrain the
design globally
“FROM-TO” Constraint Example
• Consider the example shown below with
TIMESPEC:
TIMESPEC TS01 = FROM PADS TO PADS
21;
FF1
FF2
X
• TS01 is applied to both Y - OUT1 and Z - OUT2.
CLK
Y
1 Level of Logic
2 Levels of Logic
Z<0:31>
21 ns
21 ns
OUT1
OUT2
“FROM-TO” Constraints
• The two paths could be constrained with two
commands:
TIMESPEC TS01 = FROM PADS(Y) TO
PADS(OUT1)21;
TIMESPEC TS02 = FROM PADS(Z) TO
PADS(OUT2)28;
• “FROM:TO” Constraints can start and stop at FlipFlops (use “FFS”), LATCHES, PADS, or RAMS
• Examples:
– Constrain all inputs to all Flip-Flops in block NEWFIE:
TIMESPEC TS03 = FROM PADS TO
FFS(NEWFIE)18 ns;
– Constrain all Flip-Flop to Flip-Flop paths in the design:
Creating Groups with TNM
• The TNM constraint creates a group of individual components
• Example: divide Flip-Flops into two groups based on instance
name
INST SLOWFF* TNM = SLO;
INST FASTFF* TNM = FST;
• TIMESPECS are assigned to the new groups:
TIMESPEC TS14 = FROM FFS TO SLO 40 NS;
TIMESPEC TS15 = FROM FFS TO FST 20 NS;
SLOWFF1
• Greater flexibility
in routing is achieved
by creating a different
REG1
timing requirement for these two groups
SLOWFF2
COMB3
FASTFF1
REG2
FASTFF2
Pre-Scaled Counter Example
PRE2
Q0 Q1
TC
CE
COUNT12
Q2 Q3 Q4 Q5 Q6 Q7 Q8
Q9 Q10 Q11 Q12 Q13
• Highest speed is required in the pre-scaled block
– Constrain the two counter blocks separately to avoid
over-constraining COUNT12
• Define two groups for use in TIMESPEC. Example
UCF file:
INST FFS(PRE2) TNM = PRE;
INST COUNT12 TNM = UPPER;
TIMESPEC TS_PRE = FROM PRE TO PRE 60 MHZ;
TIMESPEC TS_TC2CE = FROM PRE TO UPPER 60 MHZ;
Creating Groups with TIMEGRP
• Another way to constrain this design is by creating
smaller groups of endpoints:
• The TIMEGRP constraint is used to create new
groups from other groups.
• FFS, LATCHES, RAMS, and PADS are
predefined groups
• Example: ALL_FFS group contains all Flip-Flops
whose instance name begins with SLOWFF or
FASTFF:
INST SLOWFF* TNM = SLO;
Select One Path From Many Paths
• Use to constrain one path among several parallel
paths
• First identify the path to be constrained with
TPTHRU, then use THRU in Timespec constraint
•NETExample:
the path through component
RED TPTHRU constrain
= ABC;
TIMESPEC TS_FIFOS = FROM RAMS(FIFORAM) THRU ABC TO FFS(MY_REG*) 25;
ABC
my_reg00
fiforam
RED
TPTHRU=ABC
my_reg01
my_reg02
my_reg03
Forward Tracing
• Forward tracing occurs when a constraint is assigned
to a net
• Constraint is applied to all global endpoints driven
by the net
• Example: constrain nets driven by DATA0 to FlipFlops in block CNT25:
NET “DATA0” TNM = MYBUS;TS_REGNCT
TIMESPEC TS_REGCNT = FROM MYBUS TO
FFS(CNT25) 30 NS; BONE
DATA0
...
CHEW
BARK
CNT25
Ignoring Paths with TIG and NET
• Timespec Ignore, “TIG”, attribute ignores a
TIMESPEC for a specific path or net
• Ex: Assume that net DOG_SLOW was constrained by 2
constraints, TS01 and TS02. The following
specification ignores TS01. TS02 only is applied to
DOG_SLOW.
NET “DOG_SLOW” TIG = TS01;
• Example to ignore a slow path between registers:
INST REGA* TNM = REGA;
INST REGB* TNM = REGB;
TIMESPEC TS_TIG01 = FROM FFS
(REGA) TO
FFS(REGB) TIG;
• TIG improves software run-time and routability
Other Constraint Constructs
• Use “Except” to filter a group of endpoints.
INST FASTFF* TNM = FST;
TIMEGRP SLO = FFS EXCEPT
FST;
• TPSYNC allows definition of end points that are
not FFS, RAMS, PADS or LATCHES.
NET “BLUE” TPSYNC = BLUE_S;
TIMESPEC TS_1A = FROM FFS TO BLUE_S
15 NS;
• Signal skew for logic driven by clocks can be
constrained using MAXSKEW constraint
NET “$1I3245/$SIG_6” MAXSKEW = 3;
 Specifies a 3 ns difference between the arrival times at
Constraint Priority
• All constraints are not created equal
– Highest Priority
Timing ignores
(TIG)
FROM:THRU:TO specs
FROM:TO specs
– Lowest Priority
PERIOD specs
• “FROM:TO” constraints are further prioritized:
– Highest:
FROM PATH-SPECIFIC TO PATH_SPECIFIC
FROM PATH-SPECIFIC TO GLOBAL
Section V
Special Topics
Section V Agenda
•
•
•
•
•
•
DSP Design with FPGAs
New Developments in Programmable Logic
Virtex, XC6200 and Reconfigurable Logic
FPGA versus ASIC costs
Xilinx Student Edition
Xilinx University Program participation
Section V
Special Topics
DSP Design with FPGAs
FPGAs Provide Outstanding DSP
Performance
DSP
Processor
1
1
2
3
4
Mult.
Mult.
Mult.
Mult.
Mult.
FPGA
Add



Sequential processing
Fixed architecture
Complex real time software
N
•••
Mult.
Add



Parallel processing
Configurable to specific needs
No software programming
FPGAs Lower the Cost of
High Performance DSP
500
•
µP/PDSP
400
300
FPGA-Based DSP
$
200
100
•
•
Relative Performance
5
10
15
20
Customer Successes
 TIM40 Module using FPGAs (XC4010)
3 times the price at 175 times the TI TMS320C40 performance
 DNA Matching (XC4010)
Similar performance at 1/20th price
 128-Track Audio Recording Studio (XC3190)
3 times the functionality at 1/10th the price
FIR Filter Example
N BITS WIDE
SAMPLE DATA
X0
•
X1
•
X2
•
Sum of Products Equation
PRODUCT
X
SUM
 K Sums
 CLOCK = Multiply Time
C0
X
 K Multiplies
K
 Sample Rate = Clock Rate
C1
X
C2
•
•
•
•
K COEFFICIENTS•
•
K TAPS LONG
K SUMS
OUTPUT DATA
0
IMPLEMENTATION ???
Traditional FIR Filter Implementation

General-Purpose DSP
1
– PERFORMANCE =
MAC cycle time X Number of Taps
– TMS320: MAC cycle time = one clock cycle
10-bit, 20-tap filter with 50 MHz TMS320 = 2.5 MHz
Additional filter taps slow performance
– Pentium: MAC cycle time = 11 clock cycles
SAMPLE
DATA
Distributed Arithmetic (DA)
8 WORD X N BIT
Filter Design
LOOK UP TABLE
MSB
PARALLEL IN
SERIAL OUT
2 -1 Scaler
LOOK
UP
TABLE
A
Binary
SHIFT
ADRS
DATA

n
PERFORMANCE =
B
R
E
G
I
S
T
E
R
n
FILTERED
DATA OUT
000
...000000
001
C0
010
C1
011
C1 + C0
100
C2
101
C2 + C0
110
C2 + C1
111
C2 + C1 + C0
Clock Frequency
Number of Bits in Sample
10-bit, 20-tap filter using XC4000 at 50 MHz = 5 MHz
Distributed Arithmetic - 3 bit Example
D2 x C2
D1 x C1
D0 x C0
1 0 0
x 1 1 0
1 1 1
x 1 0 1
0 1 1
x 1 0 0
C2 x D2
C1 x D1
C0 x D0
1
x 1
0
0 0
1 1 0
1 0
0 0
0 0
0
1
x 1
1
1 0
1 0 1
0 1
1 1
0 1
1
1
x 0
1
1 0
0 0 0
Data
Coefficient
0 0
1 1
0 0
0
0 1 1 = LUT Address ==> (C1 + C0 ) from previous slide
Coefficient
Data
CLBs
Resource Tradeoffs for Higher
Performance
Double-Rate
66
MHz
400
Fully-Parallel
Distributed
Arithmetic
300
Distributed
Arithmetic
16.2
MHz
8.1
MHz
•
•
200
Bit-Serial
Distributed
Arithmetic
•
•
•
•
•
16
32
48
100
•
100 Hz to 100 kHz
•
64
Number of Filter Taps
•
Serial Sequential
80
XC4085XL 10 Times Faster Than
TMS320C6x
8
7
16 bit FIR Filter Benchmark
6
Billions of
MACs per
Second
Multiply ACcumulates per Second
5
4
3
2
1
TMS320C6x 4005XL 4013XL
0.25 , 200 MHz
4036XL 4062XL 4085XL
XC4000XL using 80 MHz clock rate
FPGA DSP is Lower Cost
$0.25
$0.20
$0.15
$0.10
$0.05
TMS320C6x
Xilinx FPGA
(25,000 pcs)
(25,000 pcs)
Price per Million MACs per Second - 16-bit word
Where FPGA-Based DSP is Used
• High Data Rates
– 1 to 70 M
samples/sec
• High Complexity
– 10’s to 100’s of
MACs in a single
chip
• Fixed-Point Data
• Audio, Video, Radio
& Voiceband
Modems, HDTV
D
a
t
a
1G
R
a
t 100M
e
S
a 10M
m
p
l 1M
e
s
100k
p
e
r
10k
s
e
c 1k
o
n
d
ASIC
FPGA-Based
DSP
Multiple DSP Cores
or Chips
Single-Chip DSP
MPU/MCU
Less Complex
Algorithm Complexity
More Complex
DSP / FPGA Design Methodology
THIRD-PARTY
Coefficients
DSP
SOFTWARE
CORE
Generator
Xilinx CORE
Generator 1.4
available now!
Instantiate
into
schematic or HDL
PLACE
AND ROUTE
BIT STREAM FOR
DOWNLOAD CABLE, OR EPROM
POST ROUTE
SIMULATION
XC4000 Resource Cross Reference Chart
(Bit-Serial
TAPS
NUMBERImplementation)
OF XC4000 CLBs
8
17
20
23
26
29
36
39
42
45
16
37
44
51
58
65
80
87
94
101 108
24
57
68
79
90
91 124 135 146 157 168
32
77
92
107 122
137 168 183 198 213 228
40
97 116
135 154 173 212 231 250 269 288
48
117 140
163 186 209 256 275 302 325 348
56
137 164
191 218 245 300 327 354 367 408
WORD SIZE
M
samples/sec
@50MHz
6
8
10
12
14
8.3 6.3
5.0
4.2
3.6
16
18
20
3.1 2.8 2.5
22
48
24
2.3 2.1
Section V
Team-Based
Modular
Special Topics
10 Million
300Mhz
Cores
HDL
500k
Schematic
100K
25K
50 Mhz
100Mhz
1 Million
150Mhz
133Mhz
0.25u
0.15u
0.18u
0.35u
0.5u
The Road Ahead
New Developments in
Programmable Logic
Process Technology and Supply Voltage
Feature Size ()
1.2
1
•
•
•
•
Lower cost
Faster speed
Higher density
Lower power
Today
0.8
0.6
5V
0.4
3.3 V
2.5 V
1.8 V
1.3 V
0.2
0
1990
1992
1994
1996
1998
2000
Xilinx leads PLD industry in fab technology.
Fab partners use FPGAs to drive their process.
2002
Advanced Process Technology
0.5u Process
- locos isolation
- birds beak
- no planarization
- only contact plug
0.25u UMC Process
- shallow trench isolation
- 0.9u metal pitch
- CMP
- plug for all vias
Process & Density Leadership
10M Gates
In 2002
Density (system gates)
10M
Virtex II
2M
Virtex
75+M Transistors
XC40250XV
1M
500k
XC40125XV
Industry’s 1st 0.25u PLD, 25M Transistors, 5LM
250k
180k
XC4085XL
1997
1998
1999
2000
10 Million System Gates in 2002!
2001
2002
Architecture Innovation &
 Reconfigurable Logic
AD/DA
Leadership  On-Chip
Embedded Functions
Features
 1GHz Diff. Interface
 Built-in Logic Analyzer
Distributed Dual Port RAM
IO Registers
Internal Bussing
5V Tolerant I/O
3.3V and 5V PCI
1998
 Block Dual Port RAM
 Multiple Standard I/O
 Vector Based Interconnect
 Phase Locked Loops
 66 MHz 64-Bit PCI
1999
2000
2001
2002
Performance Leadership
MHz
300
280
260
 233 MHz UP
 300 MHz RAM I/F
 133 MHz PCI
240
220
System Clock Rate* (MHz)
200
180
160
 133 MHz SDRAM I/F
 155 MHz SONET
 66 MHz PCI
140
120
100
80
 100 MHz SDRAM I/F
 100 MHz DSP for
Wireless Base Station
 33 MHz PCI
60
40
20
0
1995
*
1996
1997
1/(Tsetup+Tclock-to-out)
1998
1999
2000
2001
2002
Packaging Leadership
Pins
Flip Chip
Technology
1000
Chip Scale
Fine Pitch BGA
700
SBGA
BGA
500
HQFP
<0.8mm
1.0mm
1.27mm
PQFP
300
PGA
PLCC
100
1998
2000
2002
Compile Time Leadership
250
Minutes*
200
150
100
50
Release
0
1.3
1.4
1.5
2.1
2.2
* 100k System gate designs (200MHz Pentium)
• With Faster CPUs
• Faster Compile Times
• Modular Compile
1999 Goal: 1 Million Gates in 45 minutes!
F1.5 Features
• Tight integration
– FPGA Express inside Foundation Project Manager
– Single Project Management / Flow Engine environment
• Improved ease of use
– Complete pushbutton
• New Virtex, XC9500XL support
• Improved FPGA Express synthesis runtimes &
performance
• Improved PAR runtimes and performance
Xilinx Smart-IP Delivers...
Architectures
tailored to cores
Intelligent Software
Implementation
Flexible Core
Technology
Xilinx Smart-IP
Technology
High Predictability
High Flexibility
High Performance
Performance + Time to Market
Leader in Core Solutions
Standard Bus
Interfaces
DSP
Communication
Functions & Networking
Xilinx and Partners’ COREs
•82xx, UARTs, DMA,
•66 MHz DRAM/SDRAM I/F
•Memory (RAM, ROM, FIFO)
•Micro Sequencer (2901)
•Proprietary RISC Processors
•Microprocessor I/Fs
•8051/8031
•IEEE 1284
•MIPS
•133+ MHz SDRAM I/F
• Advanced
processors
•ATM Cell Assembly/Delineation
•CRC-16/32
•T1 Framer
•HDLC
•Reed-Solomon, Viterbi
•UTOPIA, 25/33/50 MHz
•10/100 Ethernet
•1Gb Ethernet
•ADSL, HDSL, XDSL
•ATM/IP Over SONET
•SONET OC3/12
• Modems
• SONET OC48
• Emerging Telecom
and Networking
Standards
•Add, Subtract, Integrate
•Correlators
•Filters: FIR, Comb
•Multipliers
•Transforms: FFT, DFT
•Sin/Cos
•
•
•
•
•
•
• DSP Processor I/Fs
• DSP Functions >
200 MSPS
• Programmable DSP
Engines
• QAM
•CAN Bus
•ISA PnP
•I2C
•PCI 32bit
•CardBus
•FireWire(100-400 Mbps)
•PCI 64bit/66MHz
•PC104
•VME
•PCMCIA
•USB
DCT
Cordic
DES
Divider
JPEG
NCO
1998
By 2002: Virtually All Functions Available as Cores
• Satellite
decoders
• Speech
Recognition
• Emerging HighSpeed Standard
Interfaces
1999
2000
Architecture Tailored to Cores
Segmented Architecture
Segmented Routing
Non-Segmented Routing
Core1
Core2
• Efficient Routing
• Predictable Timing
• Low Power Consumption
• Wasted Routing
• Unpredictable Timing
• High Power Consumption
Architecture Tailored to Cores
Distributed RAM
RAM Available
Locally
To The Core
• Portable RAM Based Cores
• Improves Logic Efficiency by 16X
• High Performance Cores
Intelligent Software
Pre-defined Placement & Routing
Fixed Placement &
Pre-defined Routing
Fixed Placement
I/Os
Relative Placement
Guarantees
Performance
Guarantees I/O &
Logic Predictability
Other Logic Has No
Effect on the Core
Enhances Performance & Predictability
Smart-IP Delivers Performance
12x12 Multiplier
Speed(MHz)
80
Xilinx
Segmented
70
60
Non-Xilinx
Non-Segmented
50
1
2
4
8
Number of Cores
Smart-IP Performance Is Independent of
Number of Cores in a Design
Smart-IP Delivers Portability
80 MHZ
80 MHZ
80 MHZ
80 MHZ
Smart-IP Performance Is Independent of
a Core’s Placement in the Device
Smart-IP Delivers Transportability
80 MHZ
80 MHZ
80 MHZ
Non-Segmented Architecture
May Experience 30%
Performance Degradation
Smart-IP Performance is Independent of Device Size
Xilinx Architecture for Fastest
Performance
Xilinx Segmented Interconnect
Non-segmented Interconnect
Across Chip
Across Chip
Logic
Block
1
Logic
Block
2
1x
Logic
Block
3
...
Logic
Block
n
Logic
Block
1
1x
...
Logic
Block
2
Logic
Block
n
4x
4x
3x
1x
4x
6x
Logic
Block
Logic
Block
(next row)
(next row)
Segmented Interconnect Structure Provides
Faster Logic Cell Connections
High Value Cores
with Spartan
XCS30XL
Price*
Percentage of
Device Used
Effective
Function Cost
$6.95
17%
$1.20
16-bit RISC Processor
$6.95
36%
$2.50
16-bit, 16-tap
Symmetrical FIR Filter
$6.95
27%
$1.90
Reed-Solomon Encoder
$6.95
6%
$0.40
$12.00
45%
$5.40
Core Function
UART
PCI Interface
(w/ faster speed grade)
*100,000 units, mid-1999 projection
Section V
Special Topics
DS P
0
DD
A
XC 6
2
TRADITIONAL
THINKING
0
R
It’s About Time!
Virtex, XC6200 and
Reconfigurable Logic
Virtex Enables
System on a Programmable Chip
VHDL Design
Environment
Verilog Design
Environment
New
Designer Designer
Modules
#2
#1
CoreGen
DSP
FIFO
IP Modules
AllianceCore
133Mhz
SDRAM
Virtex
Design
Reuse
CPU
Gbit
Ethernet
LogiCore
66Mhz
PCI
160 MHz I/O Performance
133 MHz Memory Performance
1 Million System Gates
Virtex Series Overview
• New FPGA architecture, similar to XC4000
• 0.25 and 0.18 micron 5LM process
• Segmented routing
• SelectRAM+ offers 3 types of RAM
– Distributed SelectRAM
– Block SelectRAM (new)
– High-speed access to external memory (new)
• Traditional and Low Voltage support
– CMOS, TTL
– LVTTL, LVCMOS, GTL+, and SSTL3
• 250K - 1M system gates in 1998
• Some XC6200-like features
– Ideal for Reconfigurable Logic
– Dynamic & Partial reconfiguration
Virtex Functional Block Diagram
CLB
Phase Locked Loop (PLL)
Segmented routing
66 MHz PCI
SSTL3
SelectI/O
Pins
Vector Based
Interconnect
delay=f(vector)
Block
SelectRAM
Memory
Distributed
SelectRAM
Memory
Xilinx 0.25 , 5 Volt-Compatible FPGAs
5V
3.3 V
2.5 V
I/O
Supply
Accepts
5 V levels
Any
5V
device
(XC4000E)
5V
3.3 V
Logic
Supply
Virtex
&
XC4000XV
2.5 V logic
3.3 V I/O
3.3 V
3.3 V
Any
3.3 V
device
(XC4000XL)
Meets TTL
Levels
• 4KXL / 4KXV Family migration possible if you
plan for:
–
–
Additional power/ground pins
Dedicated clock and configuration pins
• Voltage migration guide to help users
Virtex FPGA Performance
• 100+ MHz internal speeds
– 155 MHz SONET data stream processing
– 100+ MHz Pipelined Multipliers
– 66 MHz PCI
• 100+ MHz system interface speeds
without PLL
Tco (output register)
Tsu (input register)
Th (input register)
Max I/O performance
6 ns
3.5 ns
3 ns
3 ns
0 ns
0 ns
110 MHz
with PLL
160 MHz
8ns across
250,000 system
gates
• Predictable for early
design analysis
• Optimized for five
layer metal process
SWITCH
MATRIX
CARRY
CLB
2 LCs
2 LCs
CARRY
–
3-STATE BUSSES
CARRY
• Fast local routing
within CLBs
• General purpose
routing between
CLBs
• Fast Interconnect
CARRY
Segmented Routing Interconnect
Virtex Configurable Logic Block
 Polarity of all
control signals
selectable
 Fast arithmetic
and multiplier
circuitry
CO
I3
I2 4 Input O
I1 LUT
I0 WI DI
 Optimized for
synthesis
Carry
and
Control
PR Q
D
CERegister
CLK RS
CI
CO
CLB
2 LCs
2 LCs
I3
I2 4 Input O
I1 LUT
I0 WI DI
Carry
and
Control
CI
PR Q
D
CE Register
CLK RS
SelectRAM+ Memory Features
• Distributed SelectRAM Memory
–
–
–
–
Pioneered in XC4000 family
16x1 synchronous SRAM implemented in LUT
Ideal for DSP applications
Access over 100 Billion bytes/sec
• Block SelectRAM Memory
–
–
–
–
Up to 32 4,096-bit blocks of dual port synchronous SRAM
Configurable widths of 1, 2, 4, 8, and 16
Ideal for data buffers and FIFOs
Up to 17 gigabytes/sec access
• Fast Access to External RAM
– Direct interface to SSTL3, 3.3V synchronous DRAM standard
– 133 MHz
Block RAM
• Configure as: 4096 bits with variable aspect ratio
• 8-32 blocks across family devices
• True dual-port, fully synchronous operation
–
Cycle time <10 ns
• Flexible block RAM configuration
–
–
–
–
5 blocks: 2K x 10 video line buffer
1 block: 512 x 8 ATM buffer (9 frames)
4 blocks: 2K x 8 FIFO
9 blocks: 4K x 9 FIFO with parity
RAMB4
WEA
ENA
CLKA
ADDRA
DINA
WEB
ENB
CLKB
ADDRB
DINB
DOA
DOB
XC6200 Reconfigurable Processing Unit
1000x improvement in reconfiguration
time from external memory
CPU
FastMAPtm assures
high speed access to
all internal registers
Memory
XC6200
RPU
All registers accessed via
built-in low-skew
FastMAPtm busses
Ultrafast Partial
Reconfiguration
fully supported
I/O
Microprocessor interface
built-in
High capacity distributed memory
permits allocation of chip
resources to logic or memory
I/O
XC6264 - Up to 100,000 gates
XC6200 Architecture
16x16 Tile
4x4 Block
User I/Os
Address
Data
Control
 







 
User I/Os
*Number of tiles varies between devices in family
User I/Os
FastMAPtm
Interface
User I/Os

Function Cell
How Dynamic Reconfiguration Helps
Example: DSP
3D Graphics
DSP Algorithms
- Texture
- Shadow
- Reflections
- Perspective
- Edge
PDSP
One
function at
a time
FPGA
Two or
more
functions
at a time
ReconfigurationOptimized FPGAs
All
functions
done in
time
Some functions
run while others
are loading
Reconfiguration Advantages:
Lower cost by reusing silicon for multiple functions over time
OR
10-500x performance increase in hardware versus software
implementation
Performance
Reconfigurable Logic Research vs. Component $
Reconfigurable Logic research has
typically focussed on reconfigurable
computing1. But there are really two
potential markets: high-end embedded
computing2 and the low-cost
embedded market3.
Zillions of
Component
Dollars(3)
(2)?
Zillions of
Research
Dollars(1)
Computer
Embedded
Microprocessor
[Graph is compliments of Nick Treddenick.]
Problem Size
XC6200 Dynamic & Partial
Reconfiguration
Design Swapping
200us
XC4013
250ms
Block Swapping
XC6216
Circuit Updates
Rewiring
40ns
ns
us
ms
s
•
Directions in Reconfigurable
Logic
XC6200 was first Xilinx product to XC6200 chips &
XACT6000 software are available, but no further
product development
– Divergent architecture and incomplete tools support
– XUP support for Research only, not classes:
Adaptive or Reconfigurable Logic, Place & Route algorithms
•
Key XC6200 features brought into mainstream
families (Virtex)!
–
–
–
–
Dynamic & Partial reconfiguration
Full industry and software support
Easier to design to
New Rec.Logic curriculum should use Virtex
• Virtex-ready PCI board available from Virtual
Computer Corp.
• Further info: http://www.xilinx.com/xup/6200rc.htm
Section V
Special Topics
FPGA versus ASIC Costs
Pad-Limited Die Size
core-limited
pad-limited
Core
Core
I/O pads
I/O pads
Mid-high density:
Low Density:
Gate count
I/O count
determines
determines
die size
die size
As Processes Migrate
FPGA Cost = Gate Array Cost
FPGA Price Leadership
Without Compromises
• Pricing competitive with ASICs
• High Performance
• On-chip SelectRAMTM
Spartan
$395
Price
0.5 3LM
SpartanXL
5 Volt
More Features
$295
0.35 5LM
Spartan
Spartan-II
3.3 Volt
<
Next Generation
$200
< $150
0.25 5LM
0.18
2.5 Volt
1998
1999
*Prices are for 5K system gates, 100K units, -3 speed, Lowest Cost Package
1.8 Volt
2000
2002
CPLD Price Leadership
Without Compromises
•
•
•
•
Price
$15
Flexible ISP
Highest Performance
Pin-Locking
Full JTAG
$1.80
$9
$0.80
1998 1999 2000 2001 2002
* Prices are based on 100Ku+, slowest speed grade, lowest cost package
Priced for High-Volume
Leadership
New Applications
Density
(System
Gates)
• Set Top Box
• DVD
• Digital Camera
• PC Peripherals
• Consumer Electronics
100K
200K
$20
100K
$10
60K
60K
40K
$20
$10
25K
15K
1997
100K unit volume price projections
10K gates/$ in 2002!
1998
1999
2000
2001
2002
The Real Cost of Ownership
• Even in mid & high density, FPGAs often have cost advantage
• FPGA vs ASIC goes far beyond obvious unit costs calculations
• Real Comparison includes Real factors
Programmable FPGA
Gate Array
(Application Specific Integrated Circuit)
Higher unit cost
(-)
Lower unit cost
Standard Product
Off the shelf delivery
Fast Time to Market
No Non-Recurring Eng. Fee
No inventory risk
Fully factory tested
Simulation helpful
In-Circuit verification
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
Custom Product
Months to manufacture
Slow Time to Market
NRE+
Customer specific
User Test Development
Simulation Critical
No In-Circuit verification
Cost Calculations - Basic Model
• Breakeven - Solve for X (units)
ASIC Cost
$25K NRE + $79K
Engineering& Tools + X * $10
=
=
FPGA Cost
$0 NRE + $25K
Engineering&Tools + X * $30
X
=
54K / 20
X
=
2,700 units
Cost Calculations - Market Model
Being late to market costs Real $$
Maximum Available Revenue
Total ASIC Development = 32 weeks
Total FPGA Development = 11 weeks
% of Lost Revenue = (Delay * (3W-Delay)/2W^2)*100
= (5.25 (3*18 - 5.25)/ 36^2) *100
= 19.75%
Maximum Revenue
from delayed entry
Net Profit
= Volume *
(System Price - System Cost )
= ($2K - $1.1K) * (1K + 12K + 5K)
= $16,200,000
W
W
Product Life = 2W
= $3.2M
ASIC Cost = $25K NRE + $79K Engineering + .1975*$16.2M Lost Profit + X*$10
FPGA Cost = $25K Engineering + X*$30
Breakeven, X = 162,700 units
Hardwire Technology Model
• ASIC Re-spin delay & expense risk 30%
• PLD price reductions 25% vs. 5% per year
• Hardwire Technology lowers FPGA cost 40-60%
– No additional design work or test vectors
– Preserves nets, placement, routing
– All FPGA characteristics maintained
Total ASIC Cost = $25K + $79K + $5.3M + $22.8K + 18.7K + X * $10
FPGA/HWire Cost = $25K Engineering + 1K*$30
+ $18K NRE + (X-1K) Units * $18
Breakeven, X = 674,000 units !!!
Download the Xilinx ASIC Estimator program at
http://www.xilinx.com/products/hardwire/hardwire.htm
to compare costs or learn more.
Total Cost of Ownership - ASIC vs. FPGA
12000000
10000000
8000000
6000000
FPGA (B, M)
4000000
ASIC (B)
2000000
ASIC (M)
0
800000
700000
600000
500000
400000
300000
200000
100000
FPGA (H)
0
Total Cost ($)
14000000
ASIC (H)
Units
B = Basic analysis
M = Market model
H = Hardwire model
Section V
Special Topics
Xilinx Student Edition
The Xilinx Student Edition
• Prentice Hall’s most requested new engineering product
in Q1 ‘98 !
– Complete, affordable, and practical digital design course environment for all
students
– Predeveloped and tested lab-based course
• Includes
– Foundation Series 1.3 for students’ computers
– Practical Xilinx Designer lab tutorial book
– Coupon for XS40-005XL and XS95-108 boards ($129)
• Sold through bookstores by Prentice Hall and
www.Amazon.com, listed at $79 (ISBN 0136716296)
• Integrated tutorial projects cover:
TTL, Boolean Logic, State Machines, Memories, Flip Flops, Timing, 4-bit
and 8-bit processors
• Upgradeable for free to F1.4 Express with VHDL &
Verilog, 40K gates, VHDL labs on the web Aug.1
The Practical Xilinx Designer
•
•
•
•
•
•
•
•
•
•
The Digital Design Process
- Basic concepts and TTL logic
Programmable Logic Design Techniques
- Programmable logic introduction and Foundation tutorial
Programmable Logic Architectures
- XC9500 CPLD and XC4000 FPGA
Combinatorial Logic Design
- LED decoder circuit with both CPLDs and FPGAs.
Modular Designs and Hierarchy
- step-wise refinement using Foundation
Electrical Characteristics of Programmable Logic
- I/O drivers, timing/delay models, and power consumption
Flip-Flops
- introduces sequential logic
State Machine Design
- design examples for counters, drink machine, etc.
Memories
- how to build memory with flip-flops, logic gates.
The GNOME Microcoomputer
- construction and improvements of simple, 8-bit microcomputer.
Xilinx Student Edition
Development Boards
Section V
Special Topics
Xilinx University Program
Participation
Section Agenda
•
•
•
•
•
Course recommendations
How to learn more
Contacts & Support
Why use Xilinx?
Products & Ordering
– Software
– Hardware
Course Recommendations
• See http://www.xilinx.com/programs/univ.htm
Trends
in
Teaching
with
PLDs
• Increasing density and Cores enable System-level
design and test on an FPGA
– LogiCOREs available to all universities
– PCI, DSP, math, other complex functions
• VHDL or Verilog design is commonplace
• PLDs in many subjects beyond Digital Design and
Computer Engineering
–
–
–
–
System Level Design and Test
Dynamically Reconfigurable Logic
Digital Signal or Video Processing
Network Design
• Prevalent usage in required EE, CS, CE courses
• Students use their own computers
How
To
Learn
More
(1)
AppLinx CD / Xilinx data book
•
• On-line books, On-line Help
• Excellent on-line tutorials in Foundation & Express
• Xilinx Web Site
•
•
•
•
Application notes
Latest technical information and status
Fast Technical Help
Whatever it is, it’s probably there!
• Subscribe to XCELL Journal
• Xilinx Student Edition is great practical guide
XUP Contacts & Support
• XUP Staff:
– Jason Feinsmith, XUP Manager ([email protected], USA 408-8794961)
– Anna Acevedo, XUP Coordinator ([email protected], USA 408-8795338)
– Chris Grundy, XUP European Liason ([email protected], UK +44-1932-333-523)
– XUP Website: http://www.xilinx.com/programs/univ.htm
• Xilinx commercial or university distributors
– Channel for product distribution, updates
– http://www.xilinx.com for listing of commercial distributors
– Europractice, Chip Implementation Center (Taiwan ROC), IDEC (S.Korea),
Canadian MicroElectronics Corp.
• Technical Support
– Answers Database http://www.xilinx.com/support/searchtd.htm
– For Instructors: [email protected], USA 800-255-7778
Xilinx Donation Policy
“If a new or expanded course with lab or a research project is
being added and funding is not adequate to purchase the
required products at the University Program discounts,
Xilinx encourages any university or college to submit a
donation request.”
To Purchase or To Request a Donation What's Practical for you?
If you have sufficient budget to purchase Xilinx software, development boards, and/or chips,
then we encourage you to do so. We offer significant discounts for Xilinx software and Xilinx
development boards. However, we recognize that very often, schools simply do not have
the funding even for the discounted products. In some cases, a school might have some
funding, but not enough to obtain everything that is needed for the lab. We encourage you to
make the choice that you feel is right for your situation. Most importantly, if money is any
barrier to your immediate use of Xilinx products, you should request a donation for
what you need.
Why Xilinx?
• Xilinx is world’s leading Programmable Logic innovator with 55%
commercial FPGA marketshare
• Xilinx is nearly twice as popular in the academic market as its nearest
competitor
• Best PLD Software: Foundation; Alliance; & Synopsys partnership
• Best PLD hardware architectures
– Xilinx FPGAs and CPLDs all Reprogrammable In-System.
– Tri-state and dual port RAMs in FPGAs are best for computer structures, DSP, research,
etc.
– Only vendor with dynamically & partially reconfigurable RPU’s
Complexity
Functionality /
Course Level
• Prentice Hall / Xilinx Student Edition includes best tools on the market
with fully integrated hardware environment
• If you don’t have the budget, request a donation.
Exciting Research areas:
FPGA
3K, 5K, 4K
CPLD
9500
Speed
• Reconfigurable Computing
Virtex, XC6200
• Digital Signal Processing
XC4000X
• Networking, PCI, Computer
Architectures, Neural Nets, etc.
Computer Lab Requirements
• Win ‘95, Win NT, HP, Sun, Solaris, use Xilinx software
version 1.3, available now
– Foundation Series Express recommended for all PC users
– Other design entry tools OK too, especially on workstation
• v1.4
Minimum
MuchBetter
RAM Hard Drive
Processor
32MB 200MB
32+MB 500MB
486DX2
Pentium 120+
Typical Lab Setup
• Primary and Additional licenses *
1
9
10
2
US-FND-EXP-PC
UA-FND-EXP-PC
XS40-010XL
XS95-108
Primary Foundation package
Additional FND licenses
XC4010XL FPGA board & cable
XS9500 CPLD board & cable
• Cables vs. PROM Programmers
• Foundation Series Express package
recommended for lab
– Software updates
– Full range of devices supported
– Additional license scheme
* Workstation users, use Ux-ALI-STD-WS, and subsitute these for 10 XS40-010XL’s
10
UW-FPGABOARD
3K/4K Development boards
10
UW-XCHCBL-PC
XChecker cables
CPLD or FPGA?
•
•
•
•
•
•
CPLD
Non-volatile
JTAG Testing
Wide fan-in
Fast counters, state
machines
Combinational Logic
Small student projects,
lower level courses
•
•
•
•
•
•
FPGA
More common in schools
Great for first year to
graduate work
Excellent for computer
architecture, DSP, registered
designs
ASIC like design flow
SRAM reconfiguration
PROM required for nonvolatile operation
Since the software is integrated, you can teach with both !
Hardware
Boards
for PCs
XSTEND
- Plug-in extension for XS40 & XS95’s
- Purchase from XESS Corp.
XS40 & XS95 Boards
Access to
- Purchase from XESS or
I/O Pins for
donation from Xilinx
easy prototyping
Hardware
Boards (2)
• H.O.T. II PCI Board1
• UW-FPGABOARD2
Battery not included!
Access to
I/O Pins for
(1) Purchase HOT II from VCC
easy or
prototyping
(2) Most popular board for the workstation. Purchase
donation from Xilinx
Summary
• Enhance Your Lab Curriculum with Xilinx
• Students get better job offers
• Great products for your lab
– Leading, industry standard software
– IEEE Standard VHDL & Verilog
– Innovative hardware solutions
• Ideal from intro to graduate courses
• Great publications from Prentice Hall
• Areas of strength for research
– DSP, Reconfigurable Logic
Xilinx = Long term Programable Logic Solutions Leader
Appendix A:
Xilinx Configurable Logic Blocks
XC4000 CLB
C2
C1
C3
C4
H1 DIN S/R EC
S/R
Control
G4
G3
G2
G1
DIN
G
Func.
Gen.
SD
F'
Q
D
G'
YQ
H'
EC
RD
1
F4
F3
F2
F1
H
Func
.Gen.
F
Func.
Gen.
G'
Y
H'
S/R
Control
DIN
SD
F'
Q
D
G'
XQ
H'
EC
RD
1
H'
F'
K
X
XC4000X I/O Block
Diagram
Shaded areas are not included in XC4000E family.
XC9500
CPLDs
3
JTAG
Controller
JTAG Port
In-System
Programming Controller
Function
Block 1
I/O
I/O
I/O
I/O
Blocks
I/O
Global
Clocks
Global
Set/Reset
Function
Block 2
FastCONNECT
Switch Matrix
Function
Block 3
3
1
Function
Block 4
Global
Tri-States
2 or 4
XC9500 Function Block
Global
Clocks
AND
Array
3
Global
Tri-State
2 or 4
Macrocell 1
I/O
Macrocell 18
I/O
ProductTerm
Allocator
36
From
FastCONNECT
To
FastCONNECT
XC9500 Function Block
(2nd View)
36
Inputs
FastCONNECT
Switch Matrix
Fixed
Output
Pin
D/T Q
Function Block
Logic
Appendix B:
FPGA Family Comparisons
Xilinx Spartan
Series
5 Volt ->
XCS05
XCS10
XCS20
XCS30
XCS40
3.3 Volt ->
XCS05XL
XCS10XL
XCS20XL
XCS30XL
XCS40XL
System Gates
2K-5K
3K-10K
7K-20K
10K-30K
13K-40K
238
466
950
1368
1862
3,000
5,000
10,000
13,000
20,000
360
616
1120
1536
2016
3,200
6,272
12,800
18,432
25,088
80
112
160
192
224
80MHz
80MHz
80MHz
80MHz
80MHz
Logic Cells
Max Logic Gates
Flip-Flops
Max RAM bits
Max I/O
Performance
XC4000E 5V
FPGA Family
4003E 4005E 4006E 4008E 4010E 4013E 4020E 4025E
Logic Cells
238
466
608
770
950
1,368
1,862
2,432
Max Logic Gates
3K
5K
6K
8K
10K
13K
20K
25K
Typ Gate Range*
2-5K
3-9K
80
112
4-12K 6-15K 7-20K 10-30K 13-40K 15-45K
(Logic + Select-RAM)
Max I/O
Packages:
100%
Footprint
Compatible
* 20-25% of CLBs as RAM
128
144
160
192
224
256
PC84 PC84 PC84 PC84 PC84
TQ100
PQ100 PQ100
TQ144 TQ144
PQ160 PQ160 PQ160 PQ160
PQ208 PQ208 PQ208 PQ208 PQ208 HQ208
HQ240 HQ240 HQ240
HQ304
PG120 PG156 PG156 PG191 PG191 PG223 PG223 PG223
BG225 BG225
PG299
Spartan/XC4000E/XC5200
Density
Spartan/XL
XC4000E
XC5200
238 - 1,862
238 - 2,432
256 - 1,936
2,000 - 45,000
2,000 - 23,000
77 - 205
80 - 256
84 - 244
5
8
5
Power Supply
5V / 3.3V
5V
5V
I/O Interface
5V / 3.3V
5V
5V
Logic Cells
Typ Gate Range
2,000 - 40,000
(Logic + SelectRAM)
I/O
Number of Devices
XC4000X Series Density
Logic Cells
XC4000EX
XC4000XL
XC4000XV
2,432 - 3,078
152 - 7,448
10,982 - 20,102
Typ Gate Range
18,000 - 65,000
(Logic + SelectRAM)
I/O
1,000 - 180,000 80,000 - 500,000
256 - 288
64 - 448
448
2
11
4
Power Supply
5V
3.3V
3.3V + 2.5V
I/O Interface
5V
5V / 3.3V
5V / 3.3V / 2.5V
Number of Devices
Common Features
Spartan XC4000 XC5200
Function Generators/CLB 3
3
4
Flip-flops/CLB
2
2
4
Global Nets
8
8
4
Global Three-State ControlYes
Yes
Yes
Carry Logic
Yes
Yes
Yes
Internal Three-State Buffers
Yes
Yes Yes
Boundary Scan Logic
Yes
Yes
Yes
Output Drive (Sink)
12 mA 12 mA
8 mA
Differentiating Features
Spartan
XC4000 XC5200
LCs/CLB
2.375
2.375
4
RAM
Sync. Sync./Async. None
PCI
Yes
Yes
No
Decode
No
Yes
No
Wired-AND
No
Yes
No
I/O FFs
Yes
Yes
No
Config
Ser
Par/Ser Par/Ser
Packages
6
16
18
• Complete pinout compatibility within Spartan Series
• Not directly pinout-compatible with XC4000/XC5200
- Spartan has only one MODE pin
- Mode pin cannot be used as I/O
Xilinx XC4000-based
Architecture Comparison
Spartan/XL
Extended Routing No
Fast Capture Latch No
Global Early Buffers No
Output Mux
No
CLB Latches
No
Asynchronous RAM No
Edge Decoders
No
Wired-AND FunctionNo
XC4000X XC4000E
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Density
Comparison
Xilinx Device
Max
I/O
Device
XC4000 Series
448
XC4085XL
384
XC4062XL
XC4052XL
XC4044XL
XC4036EX
XC4028EX
XC4025E
XC4020E
XC4013E
352
320
288
256
256
224
192
Max
RAM
Bits
100K
74K
62K
51K
42K
33K
33K
25K
18K
Logic
Cells
7,448
5,472
4,992
4,598
3,800
3,774
3,078
2,880
2,432
2,432
2,304
1,862
1,728
1,368
1,152
Competing Product
Max
Max
RAM
Device
I/O
Bits
Altera FLEX 10K
25K
406
EPF10K100
18K
358
EPF10K70
20K
310
EPF10K50
16K
278
EPF10K40
12K
246
EPF10K30
12K
198
EPF10K20
Xilinx University Workshops
Appendix C
Design Tool Flows
Xilinx-Express Design Flow
DSP COREGen
& LogiBLOX
Module Generator
XNF
.NGO
VHDL
Verilog
Behavioral Simulation Models
.VEI
.VHI
HDL Editor
VHDL
Verilog
State Diagram
Editor
.V
.VHD
Schematic
Capture
EDIF
XNF
Gate Level
Simulator
VHDL
Verilog
Timing
Requirements
Express
EDIF/XNF
.UCF
Reports
.XNF
Foundation Design Entry Tools
Xilinx Implementation Tools
Reports
EDIF
BIT
JDEC
SDF
VHDL
Verilog
H
D
L
S
I
M
U
L
A
T
I
O
N
Xilinx
Design
Manager
Flow 1.4
FPGA
Implementatio
n
Design
Manage
r Flow
1.4
CPLD
Implementat
ion
Design Flow
Design Entry
Concept
Functional Simulation
Mixed-Level
Schematic/HDL
Verilog XL,
Leapfrog
Simulation
Libraries
Netlist
Information
Design Synthesis &
Retargetability
Synthesis
Libraries
Synergy HDL/VHDL
Synthesis
Functional Simulation /
Verification
OpenSIM BackPlane
Design Optimization/
Partitionning for PLDs
PLD Designer
Design Optimization
for FPGAs
FPGA Designer
Netlist Creation
*EDIF,
XNF
VHDL, VERILOG
VerilogLink/VHDLLink
Place & Route
Timing
Simulation
Implementation Tools
Verilog-XL,
Leapfrog
Post Implementation
Netlist & SDF
Schematic Redraw
Verilog, VHDL
**SDF, *EDIF
PLD & FPGA Designer
Timing Backannotation
Device Programming Files
*Standard
Interface Netlist Format
Delay Format
** Standard
Simulation
Libraries
Design Flow
ABEL HDL
VHDL Entry & Compile
Viewlogic
ViewSyn
LogiBlox
VHDL Synthesis
Viewlogic
ViewSyn
Behavioral Simulation
Viewlogic
Speedwave
Structural Simulation /
Functional Simulation
Schematic Entry /
View Schematic
LogiCores
Viewlogic
ViewSim
Viewlogic
ViewDraw
Optional
Waveform Analysis
Netlist
(XNF or *EDIF)
Viewlogic
ViewTrace
Netlist Launcher
NGDBUILD
Place & Route
Implementation Tools
PAR (Place & Route)
VHDL,
*XNF
Timing
Simulation
**SDF
Viewlogic
ViewSim
Timing Annotated
EDIF Netlist
*Standard
Interface Netlist Format
Delay Format
** Standard
Device Programming Files
Design Flow
Schematic Design Flow
HDL Design Flow
Mentor Design Manager
Mentor Design Manager
LogiBlox
VHDL / Verilog HDL
ABLE HDL
Notepad / QuickHDL
LogiCores
FunctionalSimulation
Design Entry
Design Architect
LogiBlox
QuickHDL
Simulation Preparation
Design View Editor
Optional
Synthesis & Optomization
Functional Simulation
LogiCores
Autologic II
QuickSim II
Optional
*EDIF
*EDIF
Place & Route
Place & Route
Implementation Tools
Implementation Tools
VHDL or VERILOG
*SDF
Device Programming Files
*EDIF w/ Timing
*SDF
Timing Simulation
Timing Simulation
QuickHDL
*Standard
Interface Netlist Format
Delay Format
** Standard
QuickSim II
Device Programming Files
Synopsys Design Compiler Design Flow
Xilinx Unified Libraries
VHDL/VERILOG Models
Functional Simulation
HDL Source File
(VHDL or
Verilog HDL)
Synthesis
LogiBlox
Synthesis
Library
Synopsys
FPGA Compiler or
Design Compiler
Synopsys
VHDL System Simulator
or
3rd Party
VHDL/VERILOG Simulator
Simulation
Library
LogiCores
Optional
Netlist
(XNF or *EDIF)
Post-layout Verification
Static Timing
Verification
Constraints
File
Netlist Launcher
Static Timing Report
Synopsys
VSS Simulator
NGDBUILD
Place & Route
Implementation Tools
VHDL,
VERILOG,
*SDF
PAR (Place & Route)
Device Programming Files
*Standard
Interface Netlist Format
Delay Format
** Standard
Timing Simulation
Synopsys
VSS Simulator
or
3rd Party
VHDL/VERILOG Simulator
Design Flow
XNF modules
(Created by HDL
Synthesis tools)
ABEL HDL
Schematic Entry
Functional Simulation
OrCAD/ESP
Design Environment
OrCAD Simulate
XSimMake
LogiBlox
Netlist
(XNF or *EDIF)
LogiCores
Optional
Netlist Launcher
NGDBUILD
Place & Route
Implementation Tools
VHDL,
*XNF
*SDF
PAR (Place & Route)
Device Programming Files
*Standard
Interface Netlist Format
Delay Format
** Standard
Synplicity Design Flow
VHDL
Verilog
DSP COREGen
& LogiBLOX
Behavioral Simulation Models
Module Generator
.NGO
XNF
.VEI
.VHI
Verilog & VHDL
Instantiation
cross
probing
RTL View
SDF
Technology View
.NGO = Xilinx binary netlist
.SDC
Place & Route
Constraints
.NCF
EDIF
.VM
.VHM
XNF
EDIF
-route
-improve
Xilinx Implementation Tools
User
Constraints
File
Reports
BIT
JEDEC
EDIF
3rd Party
Simulation
HDL
TestBench
VHDL
Verilog
Compile & Map
Engine
Structured
Verilog and
VHDL netlists
Unisim
VITAL & Verilog
Functional Simulation Flow
Timing Simulation Flow
HDL Analyst
Gate
XNF, VM,
VHM, EDIF
VHDL
Verilog
VHDL
Verilog
HDL Editor
Timing &
Design
Constraints
Unified
VHDL
Verilog
SDF
VHDL
Verilog
simprim
VITAL, Verilog, Gate
Command File
or
Test Vectors
Xilinx University Workshop
Appendix D
XChecker Cable
and Configuration
*Note: Although differences are very minimal, this information has not been updated to reflect M1 information.
se XChecker Cable to Simplify Verificatio
• Downloading allows quick verification of
design in circuit
– Bitstream downloaded via computer’s serial
port directly into FPGA
– No PROM programming required
– Design changes and verifications made quickly
• Readback sends configuration data and flipflop values back out of chip
– Verifies correct configuration
Enabling Configuration Readback
• Readback Trigger input starts serial
readback
• XC3000 controlled via Bitstream
Generator
– Default is enabled
– Data and trigger connected to Mode pins
• XC4/5000 controlled via schematic and
CLK
DATA
XChecker
Bitstream Generator
OPAD
RD
READBACK
TRIG
RIP
OBUF
(MD1)
IPAD
– (MD0)
Include
Readback
symbol
in
schematic
IBUF
XChecker
RT
– Connect TRIG and DATA to I/O pins
Available Readback Data
Data includes all storage elements in device
– XC4000/XC5000 readback data includes all
outputs of CLBs and IOBs
• XC4000/XC5000 data is captured when
readback is triggered
• XC3000 data is captured as readback
progresses
– May want to stop system clock for logic
verification
Control Panel Defines Debug Session
(XACT™step v6)
• Opens automatically for Debug
• Allows direct control of:
– System clock source definition and application
– Readback trigger source definition and
application
– Number of readbacks
– Display options
How to Use Programmable
Logic to Build Fast and
Efficient DSP Functions
XUP Workshop
Appendix E
Originally created by: Greg Goslin
Xilinx, Corporate Applications
Constraint Driven Design
Methodology
• Constraints
– System Requirements
Constraint Driven Design methodologies
– Hardware Limitations
• Data Rate
– Inputs
– Outputs
– Multi-Channel I/O
• Quality
– Number of Bits/Taps
– Number of Operations
Data Rate
Quality
Processor Power
Clock Rate
Options
Performance
Efficiency
uilding Fast and Efficient Filters in FPGA
• Efficient Filter Algorithms for FPGAs
– Distributed Arithmetic:
• Bit-Serial
• n-Bit Parallel
• Using Distributed Arithmetic for Filter
Designs
– Serial FIR Filter Example
– Two-Bit Parallel FIR Example
– Full Parallel FIR Example
FIR FILTER EXAMPLE
N BITS WIDE
Sum of Products Equation
SAMPLE DATA
X0
•
X1
•
X2
PRODUCT
SUM
X
 K Multiplies
 K Sums
 CLOCK = Multiply Time
C0
K
X
 Sample Rate = Clock Rate
C1
OUTPUT DATA
•
•
•
•
X
0
C2
K COEFFICIENTS
K TAPS LONG
K SUMS
•
•
•
IMPLEMENTATION ???
2’s Complement Math
• The 2’s Complement of a number:
Invert (1’s Complement) then Add 1.
11111010 (-6) the 2’s Comp. is
(Invert) 00000101, (Add 1) Equals: 00000110
(+6)
• Leading 1’s and 0’s are only place holders:
(Sign extending a 2’s Comp. number doesn’t
change its value)
XMSB ... X2 X1 X0 equals XMSB XMSB XMSB ... X2
X 1 X0
The following 2’s Complement pairs are the
8-Bit X 8-Bit Signed Multiply
S
B7B6B5B4B3B2B1B0
SIGN EXTEND
X
A7A6A5A4A3A2A1A0
A0(B7B6B5B4B3B2B1B0)
A1(B7B6B5B4B3B2B1B0)
A2(B7B6B5B4B3B2B1B0)
A3(B7 B6B5B4B3B2B1B0)
A4(B7 B6 B5B4B3B2B1B0)
A5(B7 B6 B5 B4B3B2B1B0)
A6(B7 B6 B5 B4 B3B2B1B0)
+ A7(B7 B6 B5 B4 B3 B2B1B0)
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
8-Bit X 8-Bit Signed Multiply
S
B7B6B5B4B3B2B1B0
SIGN EXTEND
X
A7A6A5A4A3A2A1A0
7
SE(B7 B6 B5 B4 B3 B2B1B0)*A7 2
SE(B7 B6 B5 B4 B3B2B1B0)*A6 26
5
SE(B7 B6 B5 B4B3B2B1B0)*A5 2
4
SE(B7 B6 B5B4B3B2B1B0)*A4 2
SE(B7 B6B5B4B3B2B1B0)*A3 23
2
SE(B7B6B5B4B3B2B1B0)*A2 2
1
SE(B7B6B5B4B3B2B1B0)*A1 2
0
SE(B
7
B
6
B
5
B
4
B
3
B
2
B
1
B
0
)*A
0
2
+
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
4-Bit
Signed
Tree
Multiplier
{ 1/2 B*A - B*A }
1
B
A3
2
B3
3:0
Sign Extend
B * A3
Sign Extend
3:0
B * A2
-A3 *{ B3B3B2B1B0 }
+A2 *{ B3B3B3B2B1B0 }
+A1 *{ B3B3B3B3B2B1B0 }
+A0 *{ B3B3B3B3B3B2B1B0 }
{ P7P6P5P4P3P2 }
-B
3:0
B3
B2
A2
-A3 *{ B3B3B2B1B0 }
+A2 *{ B3B3B3B2B1B0 }
3
CARRY IN
R
E
G
7:2
{ P7P6P5P4P3P2P1P0 }
A/2
3:1
B
7:2
B0
LSB
5-bit Signed Adder & Reg = 3 CLBs
B5
B5
Sign Extend
{ 1/2 B*A0 - B*A1 }
1
B
A1
B
A0
+A1 *{ B3B3B2B1B0 }
+A0 *{ B3B3B3B2B1B0 }
Sign Extend
B * A1
B
3:0
B3
B2
Sign Extend
3:0
B * A0
{ P5P4P3P2P1P0 }
R
E
G
A/2
16 Gated Bits and Reg = 8 CLBs
Total = 18 CLBs
B1
B0
LSB
6-bit Signed Adder & Reg = 4 CLBs
5:0
3:1
B0
7:0
A/4
5:2
CARRY IN
B3
3:0
R
E
G
LSB
5-bit Signed Adder & Reg = 3 CLBs
D.A. ONE TAP FIR FILTER = D0 C0
REDUCES TO MULTIPLYING A VARIABLE TIMES A CONSTANT
A[0]
0
2 WORD X N BIT
LOOK UP TABLE
...000000
1
C0
N BITS WIDE
SAMPLE DATA
Xn
2 -1
DIN
N
LOOK
UP
TABLE
X
3
X
A
2
X
1
ADRS
X
A0
0
Scaling
Accum.
+
1
DATA
B
-
R
E
G
I
S
T
E
R
X0(B7B6B5B4B3B2B1B0)
+X1(B7B6B5B4B3B2B1B0)
S9S8S7S6S5S4S3S2S1S0
+X2(B7B6B5B4B3B2B1B0)
FILTERED
S10S9S8S7S6S5S4S3S2S1S0
DATA OUT
+X3(B7 B6B5B4B3B2B1B0)
S11S10S9S8S7S6S5S4S3S2S1S0
+X4(B7 B6 B5B4B3B2B1B0)
S12S11S10S9S8S7S6S5S4S3S2S1S0
+X5(B7 B6 B5 B4B3B2B1B0)
S13S12S11S10S9S8S7S6S5S4S3S2S1S0
+X6(B7 B6 B5 B4 B3B2B1B0)
S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
+X7(B7 B6 B5 B4 B3 B2B1B0)
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
D.A. TWO TAP FIR FILTER = D0 C0 + D1 C1
N BITS WIDE
SAMPLE DATA
N
D0
XN
X2
X1
X0
X2
X1
X0
01
C0
10
c1
11
C0 + C1
A[10]
A0
2 -1
LOOK
UP
TABLE
XN
D1
00
4 WORD X N BIT
LOOK UP TABLE
...000000
A
ADRS
A1
Scaling
Accum.
+
DATA
B
-
R
E
G
I
S
T
E
R
(X0,0,X1,0)(B7B6B5B4B3B2B1B0)
+(X0,1,X1,1)(B7B6B5B4B3B2B1B0)
S9S8S7S6S5S4S3S2S1S0
+(X
0,2,X1,2)(B7B6B5B4B3B2B1B0)
FILTERED
S10S9S8S7S6S5S4S3S2S1S0
DATA OUT
+(X0,3,X1,3)(B7 B6B5B4B3B2B1B0)
S11S10S9S8S7S6S5S4S3S2S1S0
+(X0,4,X1,4)(B7 B6 B5B4B3B2B1B0)
S12S11S10S9S8S7S6S5S4S3S2S1S0
+(X0,5,X1,5)(B7 B6 B5 B4B3B2B1B0)
S13S12S11S10S9S8S7S6S5S4S3S2S1S0
+(X0,6,X1,6)(B7 B6 B5 B4 B3B2B1B0)
S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
+(X0,7,X1,7)(B7 B6 B5 B4 B3 B2B1B0)
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0
D.A. THREE TAP FIR FILTER
N BITS WIDE
SAMPLE DATA
N
D0
XN
X2
X1
X0
D1
A0
X2
X1
X0
2 -1
C0
010
100
C1
C1 + C0
C2
101
C2 + C0
110
C2 + C1
111
C2 + C1 + C0
011
LOOK
UP
TABLE
A
ADRS
A1
XN
D2
001
A[210]
XN
X2
X1
X0
000
8 WORD X N BIT
LOOK UP TABLE
...000000
+
DATA
A2
Scaling
Accum.
B
-
R
E
G
I
S
T
E
R
FILTERED
DATA OUT
(X0,0,X1,0,X2,0)(B7B6B5B4B3B2B1B0)
+(X0,1,X1,1,X2,1)(B7B6B5B4B3B2B1B0)
S9S8S7S6S5S4S3S2S1S0
+(X0,2,X1,2,X2,2)(B7B6B5B4B3B2B1B0)
S10S9S8S7S6S5S4S3S2S1S0
+(X0,N,X1,N,X2,N)(B7B6B5B4B3B2B1B0)
S(N+M) ... S13S12S11S10S9S8S7S6S5S4S3S2S1S0
The Development of a
Distributed Arithmetic FIR Filter
10-Bit 10-Tap - XC4000 Family Example
10 BIT 10 TAP SYMMETRICAL FIR FILTER
100 BIT
SHIFT
REGISTER
SAMPLE
DATA
10
PARALLEL IN
SERIAL OUT
Look Up Table is only 32 words by 10 bits
SUM(10,1)
10 BIT
SHIFT
REGISTER
A10
S10 A9
S9 A8
32 X 10 MEMORY
D0
D9
D1
D8
ADD
A0
LOOK UP
TABLE
ADD
A1
DATA
D1
SHIFT
D2
D7
D3
D6
D9
D4
D5
10
R
E
G
S1 A0
Scaling I
Accum. S
T
SIGN EXT
B10
E
B(9:0)
XOR
R
B
10
A
C_I LD
ADD
A2
COMPLEMENT ON
LAST BIT & ADD 1
Serial
Adders
DIN
A3
OPTIONAL
DOUBLE
PRECISION
ADD
A4
320 BITS
11
FILTERED
DATA OUT
Most
Significant
BYTE
SUM(0)
LOAD ON
FIRST BIT
ADD
10
Shift
Reg.
Least Significant
BYTE
10
N • K BIT
SHIFT
REGISTER
SAMPLE
DATA
N
PARALLEL IN
SERIAL OUT
SERIAL TIME SKEW
BUFFER
N BIT
SHIFT
REGISTER
N • K BIT
SHIFT
REGISTER
SAMPLE
DATA
N
N BIT
SHIFT
REGISTER
PARALLEL IN
D_0
D_0
SAMPLE DATA WORD SIZE = N BITS
NUMBER OF TAPS = K
RAM16X1R
DATA_I
A3
A2
A1
A0
• One N Bit Shift Register Per Tap
D_1
SHIFT
• Use 4000 RAM to build Shift Register
DATA_O
D_1
WR
CLK
• One 16 Bit Shift Register Per 1/2 CLB
RAM16X1R
SHIFT REGISTER
IMPLEMENTED IN RAM
D_k-1
# OUTPUTS = # TAPS
10 BIT 10 TAP = 50 CLBs
DATA_I
A3
A2
A1
A0
DATA_O
D_k-1
WR
CLK
10 BIT 10 TAP = 10 CLBs
Serial Adder
D0
D9
ADD
A+B
SUM
D
FF
D1
D8
D2
D7
Clk
ADD
A
B
ADD
Carry In
A+B+Carry
Carry
D
FF
D3
D6
ADD
D4
D5
ADD
Clk
CLR
CNT=10
Serial
Adders
1 CLB Per 2 Taps
DISTRIBUTED ARITHMETIC
LOOK-UP TABLE
32 X 10 MEMORY
A0
LOOK UP
TABLE
A1
• HOLDS ALL PARTIAL PRODUCTS
DATA
• LUT IS AS WIDE AS COEFF
A2
A3
A4
320 BITS
• CAN USE MEMGEN TO BUILD LUT
1’s COMPLEMENTER
INVERT
D0
D
Q
• INVERTS DATA ON LAST CYCLE
• 2 BITS PER CLB
D1
D
Q
SCALING ACCUMULATOR
A10
S10 A9
S9 A8
R
E
G
S1 A0
Scaling I
Accum. S
T
SIGN EXT
B10
DATA
E
B(9:0)
R
B
10
A
C_I LD
FORCE CARRY-IN ON
LAST BIT
10
• ADDS DATA TO (1/2) *(SUMOUT)
11
SUM OUT
Most
Significant
BYTE
• 2 BITS PER CLB
• NEED N+1 BITS
• DOUBLE PRECISION WITH SR
SUM(0)
LOAD ON
FIRST BIT
OPTIONAL
DOUBLE
PRECISION
• CAN USE XBLOX FOR RPM
DIN
Shift
Reg.
Least Significant
BYTE
10
10 BIT 10 TAP SYMMETRICAL FIR FILTER
100 BIT
SHIFT
REGISTER
SAMPLE
DATA
10
PARALLEL IN
SERIAL OUT
SUM(10,1)
10 BIT
SHIFT
REGISTER
A10
S10 A9
S9 A8
32 X 10 MEMORY
D0
D9
ADD
LOOK UP
TABLE
(RAM)
D1
D8
A0
ADD
A1
DATA
D1
SHIFT
D2
D7
D3
D6
D9
D4
D5
10
R
E
G
S1 A0
Scaling I
Accum. S
T
SIGN EXT
B10
E
B(9:0)
XOR
R
B
10
A
C_I LD
ADD
A2
COMPLEMENT ON
LAST BIT & ADD 1
Serial
Adders
DIN
A3
OPTIONAL
DOUBLE
PRECISION
ADD
A4
320 BITS
11
FILTERED
DATA OUT
Most
Significant
BYTE
SUM(0)
LOAD ON
FIRST BIT
ADD
10
Shift
Reg.
Least Significant
BYTE
10
10
RAM BASED
SHIFT REGISTER
SAMPLE DATA
TIMING AND
CONTROL
CNTEQ10
CNTEQ9
50 MHz CLK
5
10 CLBs
5 CLBs
SERIAL TIME SKEW
BUFFER
2 TO 1 REDUCTION
DUE TO SYMMETRY
A3
A2
A1
A0
7 CLBs
10
RAM OR ROM
LOOK UP TABLE
COMPLEMENT
ON LAST
CYCLE
10
FIR FILTER COEFFICIENTS
AND MULTIPLY LOOK UP
A
ADDER
9
B
5 CLBs
DATA
10 CLBs
10
XOR
10
32 X 10
ADRS
A3
A2
A1
A0
CLK
FIVE 2 BIT
ADDERS
7 CLBs
R
E
G
I
S
T
E
R
10
FILTER OUT
9 Most Significant Bits
1’S COMPLEMENT
SCALING ACCUMULATOR
• TOTAL OF 44 CLBS: FITS IN A 4002A (WITH 20 CLBS EXTRA FOR SYSTEM DESIGN)
• ABOUT 1300 EQUIVALENT GATES - LITTLE INTERCONNECT BETWEEN BLOCKS
NUMBER OF 10 BIT 10 TAP SYMMETRICAL FIR FILTERS PER XC4000 DEVICE
XC4000
PART
NUMBER OF
INSTANCES
4002A 4003A 4004A 4005A 4006
1
2
3
5
6
4008
4010
4013
8
10
15
4025
23
10 BIT 10 TAP
FIR FILTER
PERFORMANCE
• FIR10B10T MACRO CAN BE CLOCKED AT 66 MHZ @XC4000E-3
• 10 BIT WORD REQUIRES 11 CLOCKS
• 10 BIT SAMPLE WORD RATE IS 6 MHZ
FIR Filter Macro
Relatively Placed Macro
• 8 BIT WORD REQUIRES 9 CLOCKS, ETC
DATA IN
DIN_
DOUT_
DATA OUT
• 8 BIT SAMPLE WORD RATE IS 8 MHZ
BIT_CLK
10X_CLK
CLK_OUT
FIR10B10T
WORD SIZE
SAMPLE RATE
6
8
10
12
14
11.1
7.4
6.1
5.1
4.4
16
3.9
BITS
MSPS
WORD_CLK
Double-Rate DA FIR Filters
Two Bit Parallel Distributed Arithmetic
16 WORD X N BIT
LOOK UP TABLE
FIR Filter 0000 ...000000
A[3210]
SAMPLE DATA
N
D0
N BITS WIDE
XN
X2
X1
X0
A1
A0
2 -2
LOOK
UP
TABLE
XN
ADRS
D1
X2
X1
X0
A3
A2
A
Scaling
Accum.
+
DATA
B
-
R
E
G
I
S
T
E
R
FILTERED
DATA OUT
• Process 2 Bits per
Clock
• # of Clocks = (N/2)
0001
C0
0010
0011
2C0
3C0
0100
C1
0101
C2 + C1
0110
C2 + 2C1
0111
C1 + 3C0
1000
2C1
1001
2C1 + C0
1010
2C1 + 2C0
1011
2C1 + 3C0
uble Sample Rate D.A. FIR Filte
• Twice the I/O Data Sample Rate
• Two Taps Requires 4 Input LUT without
Symmetry
• Four Taps Requires 4 Input LUT with
Symmetrical FIR
• Time Skew Buffer uses Twice as many
BITS
6
8
10
12
14
16
CLBs
MSPS
• LUTs are the same, if equal bit weights are
used to address the LUTs.
WORD SIZE
SAMPLE RATE
22.2
14.8
12.2
10.2
8.8
7.8
(Double Precision)
Full Parallel D.A. FIR Filters
• One 8-Bit Tap Requires two 4 Input LUTs
and an ADDER with an offset for bit
weighting.
• Time Skew Buffer must use REGs
• Maximum I/O Data Sample Rate
• Full PDA Performance, in a XC4000E-3/2, 50-70 MHz.
BITS
6
8
10
12
14
16
WORD SIZE
SAMPLE RATE
70can
70further
70
70
66
66 sample
MSPS
– Pipelinning
increase
rate
(Double Precision)
• LUTs are the same, if equal bit weights are
used to address the 4-Coefficients in the
•
FPGA-Based DSP
Coprocessor
Design
Implementation
Performance
Old_1
– Programmable DSP
(DSP56300)
• 24 clock cycles
• 360 nsec @ 66 MHz
+
+
R
E
G
+
-
+
INC
I/O Bus
M
U
X
R
E
G
R
E
G
New _1
MSB
R
E
G
Dif f _1
I/O Bus
+
+
+
Old_2
– 37.5% of original processing
time
– 2.67X Increase in throughput
-
M
U
X
R
E
G
R
E
G
New _2
R
E
G
Prestate Buf f er
Relative Performance
• Results:
MSB
+
24-bit
– FPGA-Based Coprocessor
• 9 clock cycles
• 135 nsec @ 66 MHz
Dif f _2
R
E
G
24-bit
1 0 Bit
24-bit
3
2
2.67 tim es better
perform ance w ith
FPGA-assisted DSP
135 ns
1
360 ns
0
Tw o 66 MHz DSPs
Six 15 ns RAMs
66 MHz DSP+FPGA
Three 15 ns RAMs
8 Bit Word FIR Filter Structures
Two-Bit Parallel
Distributed
Arithmetic
Parallel
Distributed
Arithmetic
# CLBs
300
16
MHz
55
MHz
8
MHz
200
•
•
Serial
Distributed
Arithmetic
•
•
•
•
•
16
32
48
100
•
1000 to 50 KHz
•
•
64
80
Number of TAPS
Serial Sequential
Distributed
Arithmetic
FIR Filter Implementation Options
8 Bit Word Example
Serial*
Sequential
Serial*
Distributed
Arithmetic
Parallel*
Distributed
Arithmetic
8 Taps
36 CLBs
1.08 MHz
44 CLBs
8.1 MHz
250 CLBs
60 MHz
16 Taps
36 CLBs
0.46 MHz
70 CLBs
8.1 MHz
400 CLBs
55 MHz
32 Taps
44 CLBs
0.23 MHz
122 CLBs
8.1 MHz
48 Taps
62 CLBs
0.15 MHz
178 CLBs
8.1 MHz
64 Taps
70 CLBs
0.11 MHz
228 CLBs
8.1 MHz
* Note: These designs are NOT Pipelined
Lower Sample Rate Applications:
Efficient CLB Counts
Large Number of TAPs
Moderate Sample Rates
Non Symmetrical FIR OK
Serial Sequential Architecture
Serial Sequential - FIR Filter
Sample
Data
32 Tap 8 Bit Example
SAMPLE
DATA
BUFFER
Coefficient
Select
3 CLBs 5-BIT
CNTR
5
Coefficient
Table
SDB Out
SERIAL
MULTIPLY
REG
R
E
G
Filtered
Data Out
PSR
Parallel to Serial
Converter
4 CLBs
Serial Multiplier
24 CLBs Total
0
8
Clk
50 Mhz
32 - 8 Bit Coefficients
8 CLBs
8
ACC
32 x 8 LUT
8
Select
2-1 Scale
ADD
REGISTER
9
5 CLBs
Sample
Data
Coefficient
Select
SAMPLE
DATA
BUFFER
SAMPLE
DATA
BUFFER
SERIAL
MULTIPLY
SERIAL
MULTIPLY
Coefficient
Select
ACC
ACC
REG
REG
64-TAP Serial
Sequential FIR
Filter
ADD
R
E
G
I
S
T
E
R
Sample
Data
Serial Sequential - FIR Filter
SAMPLE
DATA
BUFFER
Coefficient
Select
Number CLBs vs. Taps / Word Size
8 Bit 10 Bit 12 Bit 14 Bit
SERIAL
MULTIPLY
16 Bit
8 Tap
36
43
50
57
64
16 Tap
36
43
50
57
64
32 Tap
44
53
62
71
80
48 Tap
62
77
92
107
122
64 Tap
70
85
100
115
130
• 4005 = 196 CLBs
80 Tap
97
115
133
151
169
• 4013 = 576 CLBs
96 Tap
97
115
133
151
169
• 4025 = 1024 CLBs
128 Tap
112
137
162
187
212
ACC
REG
R
E
G
Filtered
Data Out
• 4002 = 64 CLBs
Sample
Data
Serial Sequential - FIR Filter
SAMPLE
DATA
BUFFER
Maximum Sample Rate / Word Size
TAPS
8 Bit
10 Bit
8 Tap
781Khz
625Khz
390Khz
16 Tap
390Khz
312Khz
195Khz
32 Tap
195Khz
156Khz
97Khz
48 Tap
130Khz
104Khz
65Khz
64 Tap
97Khz
78Khz
48Khz
• Serial Mult. Limitations
80 Tap
78Khz
62Khz
39Khz
• Can Use Multiple 16 Tap
Building Blocks
96 Tap
65Khz
52Khz
32Khz
128 Tap
48Khz
39Khz
24Khz
Coefficient
Select
16 Bit
SERIAL
MULTIPLY
ACC
REG
R
E
G
Filtered
Data Out
• 8X Faster at 128 Taps
X8
A
8
4xCLK
B
8
8 BITS WIDE
8
4xCLK
8 BITS WIDE
D
8
4xCLK
8 BITS WIDE
E
13.5MHz Median Filter,
5-Point, 2-Bit PDA
58 CLBs for Function plus
about 10 CLBs for Control
Total = 68 CLBs
X8
4xCLK
C
X2
X1
X0
32 WORD X 12 BIT LOOK UP TABLE A.
8
4xCLK
X2
X1
X0
X8
X2
X1
X0
X8
X2
X1
X0
X8
X2
X1
X0
1-CLBs per Bit
12-Bit Partial Sums,
MSB bit weight = 1
12x2ea = 24 CLBs
A1
B1
C1
D1
E1
A0
B0
C0
D0
E0
M(A,B,C,D,E)
M(A,B,C,D,E)
4-CLBs per
8-Bit Shift Reg
4x5ea = 20 CLBs
0 0 0 0 0 000000000000
0 0 0 0 1 000110011001
1 0 0 0 0 000110011001
0 0 0 1 0 000110011001
0 0 0 1 1 001100110011
1 0 0 1 0 001100110011
0 0 1 0 0 000110011001
0 0 1 0 1 001100110011
1 0 1 0 0 001100110011
0 0 1 1 0 001100110011
0 0 1 1 1 010011001100
1 0 1 1 0 010011001100
LUT-A
0 1 0 0 0 000110011001
0 1 0 0 1 001100110011
ADRS
0 1 0 1 0 001100110011
0 1 0 1 1 010011001100
DATA
11 Bit
6-CLBs for Add
12-Bit Partial Sums
1-CLB for [ Carryout
+ LSB ]
6+1 = 7 CLBs
2x
12-Bits
R
E
G
MSB
ADRS
DATA
11 Bit
0 1 1 1 0 010011001100
0 1 1 1 1 011001100110
A
LUT-A
SIGN EXTEND
0 1 1 0 0 001100110011
0 1 1 0 1 010011001100
B
4x
LSB
R
E
G
M = (A + B + C + D + E)/5
SIGN EXTEND
MSB
B
1 0 0 1 1 010011001100
1 0 1 0 1 010011001100
1 0 1 1 1 011001100110
1 1 0 0 0 001100110011
1 1 0 0 1 010011001100
1 1 0 1 0 010011001100
1 1 0 1 1 011001100110
1 1 1 0 0 010011001100
1 1 1 0 1 011001100110
1 1 1 1 0 011001100110
1 1 1 1 1 100000000000
M(A,B,C,D,E)
A
14-Bits
1 0 0 0 1 001100110011
14
7-CLBs for 14-Bit Add
14-Bit Partial Product Sums
no Carryout
and LSBs are dropped
7 = 7 CLBs
Design the following Application:
• Equations:
• Y(R,G,B) = 0.299*R + 0.587*G + 0.114*B
• U(R,G,B) = -0.169*R - 0.331*G + 0.500*B
• V(R,G,B) = 0.500*R - 0.419*G - 0.081*B
• R, G, B Data is 8-Bits at 13.5 MHz. The
circuit already has a 2x Clk (27 MHz).
• Draw a functional schematic diagram of the
Video Coding Application with8 WORD
4xXClock
10 BIT
PARALLEL LOAD
2-BIT SHIFT REG
4 CLBs EA, = 12 CLBs
f(RGB) LOOK UP TABLE A.
000
...000000
001
CB
LUT-A
010
ADRS
011
CG
CG + CB
100
CR
101
CR + CB
110
CR + CG
111
CR + CG + CB
X8
R
LUTs are the same
5 CLBs EA, = 10 CLBs
8
8 BITS WIDE
4xCLK
X2
X1
X0
X8
G
8
8 BITS WIDE
4xCLK
R1
G1
B1
10 Bit ADDER + REG
5.5 CLBs
X2
X1
X0
2x
R0
X8
B
8
8 BITS WIDE
4xCLK
DATA
10 Bit
X2
X1
X0
G0
B0
A
R
E
G
LUT-A
ADRS
MSB
SIGN EXTEND
DATA
10 Bit
Y = 0.299*R + 0.587*G + 0.114*B
U = -0.169*R - 0.331*G + 0.500*B
V = 0.500*R - 0.419*G - 0.081*B
4x
B
LSB
A
R
E
G
SIGN EXTEND
MSB
12 Bit ADDER
6 CLBs
B
The total design would use about 110 CLBs
with control logic.
12 BITS WIDE
12
Y(R,G,B)
U(R,G,B)
V(R,G,B)
Video Coding Application with 2x Clock
PARALLEL LOAD
4-BIT SHIFT REG
4 CLBs EA, = 12 CLBs
R3
G3
B3
LUT-A
LUTs are the same
5 CLBs EA, = 20 CLBs
ADRS
DATA
10 Bit
10 Bit ADDER + REG
5.5 CLBs EA, = 11 CLBs
2x
A
X8
R
8
8 BITS WIDE
2xCLK
X2
X1
X0
R2
G2
B2
LUT-A
R
E
G
SIGN EXTEND
MSB
ADRS
12 Bit ADDER + 2 REGs
7 CLBs
4x
B
DATA
10 Bit
LSB
X8
G
8
8 BITS WIDE
2xCLK
X2
X1
X0
X8
B
8
8 BITS WIDE
2xCLK
X2
X1
X0
R1
G1
B1
R0
G0
B0
A
LUT-A
R
E
G
SIGN EXTEND
ADRS
DATA
10 Bit
MSB
B
2x
16x
A
A
R
E
G
LUT-A
ADRS
14 Bit ADDER
7 CLBs
MSB
SIGN EXTEND
DATA
10 Bit
B
LSB
B
R
E
G
12 BITS WIDE
12
Y(R,G,B)
U(R,G,B)
V(R,G,B)
LSB
All four LUTs are the same.
The total design would use about 180 CLBs
with control logic.
Xilinx Introduces First
Fully Programmable
System Solution
First FPGA Architecture Designed for
Intellectual Property
FPGA Technology Roadmap
Density/Performance
Generation 3 architecture
1 Million+ system gates
System Solution
0.25/0.18
XC4000E
Largest Device
XC4025
0.5m
1995
XC4000EX
Largest Device
XC4036EX
0.5m
1996
XC4000XL
Largest Device
XC4085XL
0.35m
1997
Year
XC4000XV
Largest Device
XC40250XV
0.25m
1998
1999
Process Technology and
Supply Voltage
1.2
Feature Size (m)
1
Virtex FPGAs Ship
•
•
•
•
0.8
Lower cost
Faster speed 0.6
Higher density
Lower power 0.4
Voltage
5
0.2

0
1990
1992
1994
1996
1998
2000
3.3
2.5
1.8
1.3
2002
Virtex FPGAs Leverage Xilinx Process Technology Leadership
Voltage and Family Migration
• Virtex FPGAs and XC4000XV share common
process (0.25 )
–
2.5 V logic, 3.3 V I/O with 5 V tolerance
• Family migration from XC4000XL possible
–
Voltage migration guide will assist users
• Design with XC4000XL now and plan ahead
for XC4000XV and Virtex FPGAs
Xilinx 0.25 , 5 Volt-Compatible FPGAs
5V
3.3 V
2.5 V
I/O
Supply
Accepts
5 V levels
5V
Any
5V
device
(XC4000E)
3.3 V
Logic
Supply
Virtex
&
XC4000XV
2.5 V logic
3.3 V I/O
3.3 V
3.3 V
Meets TTL
Levels
• Family migration possible if you plan for:
–
–
Additional power/ground pins
Dedicated clock and configuration pins
• Voltage migration guide to help users
Any
3.3 V
device
(XC4000XL)
System Level
Design Trend
PC Board
Scratch Pad
SRAM
DSP
PCI
Bus
I/F
RAM
I/F
Custom
Logic
High-Density
High-Performance
Custom Device
Introducing Xilinx Virtex
FPGAs
IP
Software
System Building
Blocks
Fast, Flexible I/Os
Segmented Routing, 4-Input LUT FPGA
Architecture
Leading Edge Process Technology
World’s first fully programmable system-level architecture
Advanced Process Technology
0.5u Process
- locos isolation
- birds beak
- no planarization
- only contact plug
0.25u UMC Process
- shallow trench isolation
- 0.9u metal pitch
- CMP
- plug for all vias
Family Overview
• 0.25um, 5 layer metal process
• Density: 50 thousand to 1 million system
gates
• Performance
– 100+ MHz performance
• 3 to 4 LUT levels
– 160 MHz system performance
• Clock to output + input setup
• First device in 2Q98
– 250,000 system gates
– One million system gate device by end of
1998
Virtex FPGA Performance
• 100+ MHz internal speeds
– 155 MHz SONET data stream processing
– 100+ MHz Pipelined Multipliers
– 66 MHz PCI
• 100+ MHz system interface speeds
without PLL with PLL
Tco (output register)
3.5 ns
Tsu (input register)
3 ns
Th (input register)
0 ns
6 ns
3 ns
0 ns
Functional Block Diagram
CLB
66 MHz PCI
SelectI/O
Pins
Block
SelectRAM
Memory
PLL
Segmented routing
SSTL3
Vector Based
Interconnect
delay=f(vector)
Distributed
SelectRAM
Memory
Virtex Clocking
Clocking and PLL
• 4 low skew clock resources
– 3ns setup, 0ns hold clock pad -> IOB input
FF
– 6ns clock to out
clock pad -> IOB
output FF
• 24 Additional low skew globals
– clocks, enables, resets, etc
– faster than 4KXL secondary global buffer
• PLL for system clock deskew and fast
clock to out.
Virtex CLB
8ns across
250,000
system gates
• Predictable for
early design
analysis
• Optimized for
five layer metal
process
SWITCH
MATRIX
CARRY
CLB
2 LCs
2 LCs
CARRY
–
CARRY
• Fast local routing
within CLBs
3-STATE BUSSES
• General purpose
routing between
CLBs
• Fast Interconnect
CARRY
Segmented Routing Interconnect

Virtex Configurable Logic
Block
Polarity of all
control signals
selectable
 Fast arithmetic
and multiplier
circuitry
CO
I3
I2 4 Input O
I1 LUT
I0 WI DI
 Optimized for
synthesis
Carry
and
Control
PR Q
D
CERegister
CLK RS
CI
CO
CLB
2 LCs
2 LCs
I3
I2 4 Input O
I1 LUT
I0 WI DI
Carry
and
Control
CI
PR Q
D
CE Register
CLK RS
Virtex IO
Simplified
IOB
Fast I/O drivers
•
• Registered
input, output,
3-state enable
control
• Programmable
slew rate, pullup, input delay,
etc.
• Selectable I/O
Standards
– SSTL, GTL,
LVTTL...
DFF/LATCH
D
CE
Q
S/R
DFF/LATCH
D
CE
Q
PAD
S/R
DFF/LATCH
D
CE
Q
S/R
Virtex Memory
SelectRAM+ Memory
Features
• Distributed SelectRAM Memory
– Pioneered in XC4000 family
– 16x1 synchronous SRAM implemented in
LUT
– Ideal for DSP applications
– Access over one hundred billion bytes/sec
• Block SelectRAM Memory
– 4096 bit blocks of dual port synchronous
SRAM
– Configurable widths of 1, 2, 4, 8, and 16
– Ideal for data buffers and fifos
– Up to 17 gigabytes/sec access
Block RAM
• Configure as: 4096 bits with variable aspect ratio
• 8-32 blocks across family devices
• True dual-port, fully synchronous operation
–
Cycle time <10 ns
• Flexible block RAM configuration
–
–
–
–
5 blocks: 2K x 10 video line buffer
1 block: 512 x 8 ATM buffer (9 frames)
4 blocks: 2K x 8 FIFO
9 blocks: 4K x 9 FIFO with parity
RAMB4
WEA
ENA
CLKA
ADDRA
DINA
DOA
WEB
ENB
CLKB
ADDRB
DINB
DOB
Real Time Video Processor
High Speed
Synchronous
DRAM
(Mbytes)
Frame Data
 Hierarchy of RAM
provides efficient
and very high
bandwidth data
processing
Video
Data In
Block
Block
SelectRAM
SelectRAM
Memory
Memory
Line Data
(Kbytes)
(kbytes)
Distributed
Distributed
SelectRAM
SelectRAM
Memory Pixel Data Memory
(bytes)
(bytes)
Virtex
FPGA
Video Pixel
Processing
Function
(logic)
Processed
Video Out
Virtex FPGA Summary
• 1 Million+ system gates
• 100+ MHz performance from all devices
• Building blocks for system level design
• ASIC design flow software
• Platform for CORE reuse
First fully programmable system solution
Related documents
Download