Introduction To VIRTEX II Architecture

advertisement
Introduction To VIRTEX II
Architecture
Presented By:
Ankur Agarwal
Xilinx Design Flow
Plan & Budget
Create Code/
Schematic
HDL RTL
Simulation
Implement
Translate
Functional
Simulation
Synthesize
to create netlist
Map
Place & Route
Attain Timing
Closure
Timing
Simulation
Create
Bit File
Xilinx Architecture features


High performance at 2.5, 3.3V and 5V
Technology Independence


EDIF, VHDL, Verilog, SDF Interface
Footprint compatibility


Devices with each family are compatible
with each other
Pin locking
VIRTEX


Up to 2 Million System Gates at 100+
MHz
Features:




Distributed and Block RAM available
Low Power
Delay Logic Loops
2.5V Internal Operation with support of
common power
Naming Conventions

XC4028XL-3-BG256
Package
Speed Grade
Sub-Family (3V = XL, 5V = no XL)
No. of Gates
Family (4000, 9500)
Spartan starts with XCS
CPLD and FPGA
Complex Programmable Logic
Device (CPLD)
Field-Programmable Gate Array
(FPGA)
Architecture
PAL/22V10-like
More Combinational
Gate array-like
More Registers + RAM
Density
Low-to-medium
0.5-10K logic gates
Medium-to-high
1K to 3.2M system gates
Performance
Predictable timing
Up to 250 MHz today
Application dependent
Up to 200 MHz
Interconnect
“Crossbar Switch”
Incremental
Overview of Xilinx FPGA
Architecture
I/O Blocks (IOBs)
Programmable
Interconnect
Configurable
Logic Blocks (CLBs)
Tristate
Buffers
Global
Resources
Block Diagram of VIRTEX-II
Architecture
SONET / SDH
DCM
Distri
RAM
PCI-X
PCI
18Kb
BRAM
LVDS
CAM
FIFO
Shift
Registers
DDR
DDR
SDRAM
DDR
QDR
SRAM
DDR
CAM
Multiplier
BLVDS
Backplane
CLB Resources

Basic resource unit is the Logic Cell


1 CLB contains 2 - 4 Logic Cells, depending on device family
Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop



LUT capacity limited by number of inputs, not complexity of
function
LUTs can be used as ROM or synchronous RAM
Flip-flop can be configured as a transparent latch in Virtex and
Spartan-II
LUT
FF
Closer Look at a CLB Structure
COUT
G4
G3
G2
G1
Look-Up
Table O
Carry
&
Control
Logic
COUT
YB
Y
D
S
Q
CK
EC
CIN
CLK
CE


Look-Up
Table O
R
F5IN
BY
SR
F4
F3
F2
F1
G4
G3
G2
G1
Carry
&
Control
Logic
YB
Y
D
S
Q
CK
EC
R
F5IN
BY
SR
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
CK
EC
Q
F4
F3
F2
F1
R
SLICE
CIN
CLK
CE
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
Q
CK
EC
R
SLICE
Each slice has 2 LUT-FF pairs with associated carry logic
Two 3-state buffers (BUFT) associated with each CLB, accessible
by all CLB outputs
Interconnect Technology
Offered by VIRTEX-II


Interconnect an array of switch matrices
All Virtex II features can access routing resources
through the switch matrix
 Simplify design and place & route
Switch
Matrix
CLB
Switch
Matrix
IOB
Switch
Matrix
DCM
Switch
Matrix
Switch
Matrix
Switch
Matrix
Switch
Matrix
18Kb
BRAM
MULT
18x18
Simplified SLICE Structure

Each Slice has four outputs:




Two registered outputs
Two non-registered outputs
Two BUFTs associated, accessible by all 16
CLB outputs
Carry Logic for fast addition

Two independent carry chain per CLB
Fast Carry Logic

Each CLB contains separate
logic and routing for the fast
generation of carry signals
 Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
Carry logic is independent of
normal logic and routing
resources
MSB
Carry Logic
Routing

LSB
CLB (Configurable Logic Blocks)

Each CLB is connected to one switch matrix

Providing access to general routing resources
COUT
COUT
TBUF
TBUF
Slice S3
X1Y1
Switch
Matrix
Slice S2
X1Y0
SHIFT
Slice S1
X0Y1
Slice S0
X0Y0
CIN
Fast Connects
CIN
High level of logic integration
 Wide-input functions:
—16:1 multiplexer in 1 CLB or any
function
—32:1 multiplixer in 2 CLBs
(1 level of LUT)
 Fast arithmetic functions
—2 look-ahead carry chains
per CLB column
 Addressable shift registers in LUT
—16-b shift register in 1 LUT
—128-b shift register in 1 CLB
(dedicated shift chain)
Four-Input LUT

Implements combinatorial logic
 Any 4-input logic function
 Cascaded for wide-input functions
4-input logic function
Truth Table
Inputs(ABCD) Output(Z)
0000
0
0001
0
0010
1
0011
0
……
..
1110
1
1111
1
A
LUT
=
B
Z
C
D
Multiplexers



MUXF5 combines 2 LUTs to create
 4x1 multiplexer
 Or any 5-input function (LUT5)
 Or selected functions up to 9 inputs
MUXF6 combines 2 slices to form
 8x1 multiplexer
 Or any 6-input function (LUT6)
 Or selected functions up to 19 inputs
Dedicated muxes are faster and more
space efficient
CLB
Slice
LUT
MUXF6
LUT
MUXF5
Slice
LUT
LUT
MUXF5
CLB Multiplexers
CLB Multiplexer Location
F5
F8
MUXF8 combines the 2 MUXF7 outputs
(Two CLB)
Slice S3
F5
F6
MUXF6 combines Slices X1Y0 & X1Y1
Slice S0
MUXF6 combines Slices X0Y0 & X0Y1
F5
F6
Slice S1
MUXF7 combines the 2 MUXF6 outputs
F5
F7
Slice S2
CLB
Horizontal Cascade Chain

Wide AND-OR functions (Sum Of Products)
SOP
Slice S3
SOP
Slice S3
Slice S2
SOP
Slice S3
Slice S2
Slice S2
Slice S1
Slice S1
Slice S1
Slice S0
Slice S0
Slice S0
CLB
CLB
CLB
Shift Register
LUT





Each LUT can be configured
as shift register
 Serial in, serial out
Dynamically addressable
delay up to 16 cycles
For programmable pipeline
Cascade for greater cycle
delays
Use CLB flip-flops to add
depth
IN
CE
CLK
LUT
=
DEPTH[3:0]
D
CE
Q
D
CE
Q
D
CE
Q
D
CE
Q
OUT
Shift Register
12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
64
Operation C
3 Cycles
3 Cycles


9-Cycle imbalance
Register FPGA
 Allows for addition of pipeline stages to increase throughput
Data paths must be balanced to keep desired functionality
Shift Register Look-Up Table

High density integration of shift registers
 DSP applications use SRL16 for delay matching
 CDMA wireless and video applications require shift
registers
Up to 128-b per CLB
Cascadable output
Dynamic addressable output
16-b per LUT
Multiple SRLC16 cascadable to any length
Digital Clock Manager

High-Speed 420 MHz clock generation:
 Clock de-skew on-chip and off-chip
Up to 12 DCM per device
Fully digital circuitry
Flexible Frequency Synthesis
Synthesis outputs: clock 0° & 180° (def.: 4X)
High-Resolution Phase Shifting
DPS fixed and variable modes
Delay-Locked Loop (DLL)
Precise Clock De-Skew
DLL outputs: clock 0°, 90°, 180°, 270°
DLL outputs: clock 2X and clock division
50/50 duty cycle correction
Digital Clock Manager: DCM
Delay-Locked Loop
 Clock phase de-skew
 Duty cycle correction
 Temperature compensation
 RST input
 LOCKED output
 Attributes:
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV

CLKFX
CLKFX180


LOCKED


STATUS[7:0]
PSDONE
Clock signal
Control signal

DUTY_CYCLE_CORRECTION
DLL_FREQUENCY_MODE
CLKDV_DIVIDE = 1.5 to 16.0
STARTUP_WAIT
CLK_FEEDBACK = CLK0 or
CLK2X
Up to 4 clock outputs per DCM
Advanced Frequency
Synthesis
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
STATUS[7:0]
PSDONE
Clock signal
Control signal

Frequency Synthesis
 CLKFX is any M / D product of
CLKIN frequency
 M = 2 to 32, D = 1 to 32
 Default: M=4, D=1 (4X CLKIN)
 Always nominal 50/50 duty-cycle
 Attributes:
 CLKFX_MULTIPLY (integer)
 CLKFX_DIVIDE (integer)
 DFS_FREQUENCY_MODE
After LOCKED:
FreqCLKFX = (M/D) x FreqCLK IN
High Resolution Phase
Shifting
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
STATUS[7:0]
PSDONE
Clock signal
Control signal
Fine Phase Shifting
 Applies to all CLK outputs
 Phase shift = fraction CLKIN period
 Fixed or variable modes
 Inputs in variable mode:
 PSINCDEC input =Increase
/Decrease
 PSEN = Enable Phase Shift
 PSCLK synchronizes Phase Shift
 PSDONE output
 Attributes:
 CLOCKOUT_PHASE_SHIFT =
NONE, FIXED, VARIABLE
 PHASE_SHIFT (signed integer)
-255 to +255
Global Clocks

Up to 16 Dedicated Low Skew Clocks
16 global clock multiplexers & buffers
8 clock nets in each quadrant
Global clock ENABLE
Switch glitch-free from one clock to another
16 clock pads (can be used as user I/O)
Clock Distribution


16 Global Clock Multiplexers

Eight on the top

Eight on the bottom

Switch “glitch free” from 1 clock to the
other
NW
8 Clocks selectable per
8
quadrant
NW
8 BUFGMUX
8 BUFGMUX
8 BUFGMUX
NE
8
8 max
16 Clocks
NE
8
16 Clocks
SW
Unused Branches are Disable
(Power Saving)
8
SW
SW
8 BUFGMUX
SE
Use Global Buffers to
Reduce Clock Skew
•Global buffers are connected to dedicated routing.
•This routing network is balanced to minimize skew
•All Xilinx FPGAs have global buffers
D
D
Q
CLK2
Q
BUFG
CLK1
BUFG
Introduces clock skew between CLK1 and
CLK2
Uses an extra BUFG to reduce skew on CLK2
Design contains 2 clock signals
Global Clocks: BUFGMUX
Three modes:

Clock buffer





Stop the clock High or Low
BUFGCE (stop Low)
Clock multiplexer “glitch-free”



O
I
O
Low skew clock distribution
BUFG primitive
Clock enable

I
Switch from one clock to another
BUFGMUX
unrelated clocks
CE
I0
I1
BUFGMUX

O
S
No pulse width shorter than
1/2 of the period
Memory
On-Chip SelectRAMTM Memory
DSP Coefficients
Small FIFOs
CAM
Shallow/Wide
128x1
Distributed RAM
bytes
Large FIFOs
Packet Buffers
Video Line Buffers
Cache Tag Memory
CAM
Deep/Wide
Up to
400 Mbps/pin
DDR & QDR
18 kb
Blocks
Block RAM
kilobytes
Terabit Memory Continuum
External RAM/CAM
megabytes
Embedded 18 kb Block RAM



Up to 3 Mb on-chip block RAM
High internal buffering bandwidth
Reduced I/O count and more embedded memory
18Kbit block RAM
Parity bit locations (parity in/out busses)
Data width up to 36 bits
3 WRITE modes
Output latches Set/Reset
True Dual-Port RAM
Independent clock (async.) & control
Distributed RAM
RAM16X1S



CLB LUT configurable as
Distributed RAM
 A LUT equals 16x1 RAM
 Implements Single and
DualPorts
 Cascade LUTs to increase RAM
size
Synchronous write
Synchronous/Asynchronous read
 Accompanying flip-flops used
for synchronous read
D
WE
WCLK
A0
A1
A2
A3
=
LUT
O
RAM32X1S
D
WE
WCLK
A0
A1
A2
A3
A4
LUT
=
LUT
or
O
RAM16X2S
D0
D1
WE
WCLK
A0
A1
A2
A3
O0
O1
or
RAM16X1D
D
WE
WCLK
A0
SPO
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
18 x 18 Embedded Multiplier

Fast arithmetic functions

Optimized to implement multiply /
accumulate modules
18 x 18 signed multiplier
Fully combinatorial
Optional registers with CE & RST (pipeline)
Independent from adjacent block RAM
18 x 18 Multiplier

Embedded 18-bit x 18-bit multiplier


2’s complement signed operation
Multipliers are organized in columns
Data_A
(18 bits)
18 x 18
Multiplier
Data_B
(18 bits)
Output
(36 bits)
Basic I/O Block Structure
D Q
EC
Three-State
FF Enable
Clock
SR
Three-State
Control
Set/Reset
D Q
EC
Output
FF Enable
SR
Output Path
Direct Input
FF Enable
Registered
Input
Q
D
EC
SR
Input Path
I/O Signal Types
I/O Signal Type
Single-Ended
LVCMOS
HSTL
SSTL
Differential
LVTTL
NOTE: Only the popular IO types shown here
LVDS
Bus LVDS
LVPECL
IOB: Double Data Rate
Registers

DDR registers can be clocked by


Clock and not (clock) if the duty cycle is 50/50
CLK0 and CLK180 DLL outputs
CLK
DATA_1
DATA_2
Dual Data Rate
D1A
D1B
D2A
D1A
D1C
D2B
D2A
D1B
D2C
D2B
D1C
Built-In HSTL II Support


What is the advantage of using HSTL Class II?
 High-speed IO interface
 Bi-directional
Double parallel termination
Vtt = 0.75V
Vtt = 0.75V
R=50 
R=50 
Zo = 50
Vref = 0.75V
Digitally Controlled
Impedance


Dynamically adjusted termination resistors
 Provides drivers that matched to the impedance of the traces
 Provides on-chip termination
 Transmitter or receiver
On-Chip termination advantages:
 No termination resistors on board
 Improve signal integrity by eliminating stub reflection
 Eliminates the need for source termination (single-ended I/O)
 Reduces board routing headaches and component count
Virtex-II Family: Four and Six
Columns Block RAM & Multiplier
Device
XC2V250
Virtex-II Family Members
Device
XC2V
40
CLB Array
18Kb
BRAM
8x
8
80
250
500
16 x
8
24 x
16
32 x
24
1000 1500 2000 3000 4000 6000 8000
40 x
32
48 x
40
56 x
48
64 x
56
80 x
72
96 x
88
112 x
104
4
8
24
32
40
48
56
96
120
144
168
Multiplier
4
8
24
32
40
48
56
96
120
144
168
DCM
4
4
8
8
8
8
8
12
12
12
12
88
120
200
264
432
528
624
720
Max IOB
912 1,104 1,296
2 Columns
4 Columns
6 Columns
BRAM & Multipliers
BRAM & Multipliers
BRAM & Multipliers
VIRTEX-II Packaging
Device XC2V
Max user I/Os
CS144
FG256
FG456
FG676
FF896
FF1152
FF1517
BG575
BG728
BF957


40
88
88
88
80
120
92
120
250
200
92
172
200
500
1000
264
432
172
264
172
324
432
1500
2000
3000
528
624
720
392
528
456
624
484
720
328
392
408
456
624
516
684
4000
6000
8000
912
(1296)
1,104 1108
824
912
824
1,104
824
1,108
684
684
684
FF and BF are flip-chip ball grid arrays packages
Pinout compatibility inside same color rectangle
Download