Xilinx Guidelines for Presentation Template

Basic FPGA
Architectures
This material exempt per Department of Commerce license exception TSU
© 2011 Xilinx, Inc. All Rights Reserved
Objectives
After completing this module, you will be able to:
•
•
•
•
Describe the basic slice resources available in Spartan-6 FPGAs
Identify the basic I/O resources available in Spartan-6 FPGAs
List some of the dedicated hardware features of Spartan-6 FPGAs
Differentiate the Virtex-6 family of devices from the Spartan-6
family
• Identify latest members of Virtex-7 device family
Basic Architecture 2
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 3
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Overview
• All Xilinx FPGAs contain the same basic resources
– Logic Resources
• Slices (grouped into CLBs)
– Contain combinatorial logic and register resources
• Memory
• Multipliers
– Interconnect Resources
• Programmable interconnect
• IOBs
– Interface between the FPGA and the outside world
– Other resources
• Global clock buffers
• Boundary scan logic
Basic Architecture 4
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA
CLB
Memory Controller
I/O
CMT
MGT
BUFG
PCIe Endpoint
BUFIO
Block RAM
DSP48
Basic Architecture 5
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 Lowest Total Power
• 45 nm technology
• Static power reductions
– Process & architectural innovations
• Dynamic power reduction
– Lower node capacitance & architectural innovations
• More hard IP functionality
– Integrated transceivers & other logic reduces power
– Hard IP uses less current & power than soft IP
•
•
•
•
Lower IO power
Low power option -1L reduces power even further
Fewer supply rails reduces power
Two families: LX and LXT
Basic Architecture 6
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 LX / LXT FPGAs
** All memory controller support x16 interface, except in CS225 package where x8 only is supported
Basic Architecture 7
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 8
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA CLB
COUT
Slice1
Switch
Matrix
• CLB contains two slices
• Connected to switch matrix
for routing to other
FPGA resources
• Carry chain runs
vertically through
Slice0 only
Slice0
CIN
Basic Architecture 9
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Three Types of Slices in
Spartan-6 FPGAs
• SLICEM: Full slice
– LUT can be used for logic and
memory/SRL
– Has wide multiplexers and carry chain
SLICEX
SLICEM
• SLICEL: Logic and arithmetic only
– LUT can only be used for logic (not
memory)
– Has wide multiplexers and carry chain
or
SLICEX
• SLICEX: Logic only
– LUT can only be used for logic (not
memory)
– No wide multiplexers or carry chain
SLICEL
Basic Architecture 10
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 CLB Logic Slices
SliceM (25%)
 LUT6
 8 Registers
 Carry Logic
 Wide Function Muxes
 Distributed RAM / SRL logic
SliceL (25%)
 LUT6
 8 Registers
 Carry Logic
 Wide Function Muxes
SliceX (50%)
 LUT6
 Optimized for Logic
 8 Registers
Slice mix chosen for the optimal balance of Cost, Power & Performance
Basic Architecture 11
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA SLICE
• Four LUTs
• Eight storage elements
– Four flip-flop/latches
– Four flip-flops
LUT/RAM/SRL
• F7MUX and F8MUX
– Connects LUT outputs to create wide
functions
– Output can drive the flip-flop/latches
• Carry chain (Slice0 only)
LUT/RAM/SRL
LUT/RAM/SRL
– Connected to the LUTs and the four
flip-flop/latches
LUT/RAM/SRL
01
Basic Architecture 12
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
6-Input LUT with Dual Output
• 6-input LUT can be two 5-input LUTs with common inputs
– Minimal speed impact to
a 6-input LUT
A6
– One or two outputs
A5
– Any function of six variables or A4
two independent functions of A3
A2
five variables
A1
6-LUT
A5
A4
A3
A2
A1
D
5-LUT
O6
A5
A4
A3
A2
A1
D
5-LUT
Basic Architecture 13
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
O5
Slice Flip-Flop and FlipFlop/Latch Control
D
CE
– This is referred to as the “control set” of the flipflops
– CE and SR are active high
– CLK can be inverted at the slice boundary
– All four flip-flop/latches are configured the same
– All four flip-flops are configured the same
• SR will cause the flip-flop to be set to the
state specified by the SRINIT attribute
CK
DFF
D
Q
CE
CK
SR
For Academic Use Only
Q
CE
CK
SR
Basic Architecture 14
© 2011 Xilinx, Inc. All Rights Reserved
D
SR
●●●
• Set/Reset (SR) signal can be configured as
synchronous or asynchronous
Q
AFF/LATCH
●●●
• All flip-flops and flip-flop/latches share the
same CLK, SR, and CE signals
AFF
DFF/LATCH
D
Q
CE
CK
SR
Configuring LUTs as a Shift
Register (SRL)
LUT
D
CE
CLK
LUT
A[4:0]
D
CE
Q
D
CE
Q
D
CE
Q
D
CE
Q
Q
Q31 (cascade out)
Basic Architecture 15
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Shift Register LUT Example
20 Cycles
64
Operation A
Operation B
8 Cycles
12 Cycles
Operation C
Operation D - NOP
3 Cycles
17 Cycles
64
Paths are Statically
Balanced
20 Cycles
• Operation D - NOP must add 17 pipeline stages of 64 bits each
– 1,088 flip-flops (hence 136 slices) or
– 64 SRLs (hence 16 slices)
Basic Architecture 16
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 17
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
I/O Block Diagram
Electrical Resources
Master
IOLOGIC
IOSERDES
P
IODELAY
LVDS
Termination
Slave
IOLOGIC
IOSERDES
IODELAY
Interconnect to FPGA fabric
Logical Resources
N
Basic Architecture 18
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA Supports
40+ Standards
•
•
•
•
•
•
•
Each input can be 3.3 V compatible
LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)
LVCMOS_JEDEC
LVPECL (3.3 V, 2.5 V)
PCI
I2C*
HSTL (1.8 V, 1.5 V; Classes I, II, III, IV)
– DIFF_HSTL_I, DIFF_HSTL_I_18
– DIFF_HSTL_II*
•
SSTL (2.5 V, 1.8 V; Classes I, II)
– DIFF_SSTL_I, DIFF_SSTL18_I
– DIFF_SSTL_II*
•
•
LVDS, Bus LVDS
RSDS_25 (point-to-point)
Easier
and
More
Flexible
I/O Design!
* Newly added standards
Basic Architecture 19
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA I/O Bank
Structure
• All I/Os are on the edges of the chip
• I/Os are grouped into banks
BANK 0
BANK 3
– 30 ~ 83 I/O per banks
– Eight clock pins per edge
– Common VCCO, VREF
BANK 1
BANK 2
Chip View
(LX45/T and Smaller)
• Restricts mixture of standards in one bank
• The differential driver is only available in
Bank0 and Bank2
– Differential receiver is available in all banks
– On-chip termination is available in all banks
BANK 0
BANK 4
BANK 5
BANK 3
BANK 1
BANK 2
Chip View
(LX100/T and Larger)
Basic Architecture 20
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
• Two IOLOGIC block per I/O pair
• Each IOLOGIC contains
– IOSERDES
• Parallel to serial converter (serializer)
• Serial to parallel converter
(De-serializer)
• Selectable fine-grained delay
– SDR and DDR resources
IOLOGIC
IOSERDES
Slave
IOLOGIC
IOSERDES
IODELAY
– IODELAY
Master
IODELAY
– Master and slave
– Can operate independently or
concatenated
Interconnect to FPGA Fabric
I/O Logical Resources
Basic Architecture 21
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 22
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
SLICEM Used as Distributed
SelectRAM Memory
Single
Port
32x2
32x4
32x6
32x8
64x1
64x2
64x3
64x4
128x1
128x2
256x1
Dual
Port
32x2D
32x4D
64x1D
64x2D
128x1D
Simple
Dual Port
32x6SDP
64x3SDP
Quad
Port
•
•
Uses the same storage that is used for the
look-up table function
Synchronous write, asynchronous read
– Can be converted to synchronous read using
the flip-flops available in the slice
32x2Q
64x1Q
•
Various configurations
– Single port
• One LUT6 = 64x1 or 32x2 RAM
• Cascadable up to 256x1 RAM
– Dual port (D)
• 1 read / write port + 1 read-only port
– Simple dual port (SDP)
• 1 write-only port + 1 read-only port
– Quad-port (Q)
Each port has independent address inputs
• 1 read / write port + 3 read-only ports
Basic Architecture 23
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA Block RAM
Features
• 18 kb size
18k Memory
– Can be split into two independent 9-kb
memories
• Performance up to 300 MHz
• Multiple configuration options
Dual-Port
BRAM
– True dual-port, simple dual-port, single-port
• Two independent ports access common data
– Individual address, clock, write enable, clock enable
– Independent widths for each port
• Byte-write enable
Basic Architecture 24
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Better, More BRAM
• More Block RAMs
– 2x higher BRAM to Logic Cell ratio than Spartan-3A platform
• More port flexibility
9K BRAM
18K BRAM
– 18K can be split into two 9K BRAM blocks and can be
independently addressed
• Improves buffering, caching & data storage
– Excellent for embedded processing, communication protocols
– Enables DSP blocks to provide more efficient video and
surveillance algorithms
• Lower Static Power
Basic Architecture 25
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
OR
9K BRAM
Memory Controller
• Only low cost FPGA with a “hard” memory controller
• Guaranteed memory interface performance providing
– Reduced engineering & board design time
– DDR, DDR2, DDR3 & LP DDR support
– Up to 12.8Mbps bandwidth for each memory controller
• Automatic calibration features
DRAM
• Multiport structure for user interface
– Six 32-bit programmable ports from fabric
– Controller interface to 4, 8 or 16 bit memories devices
SRAM
Spartan-6
FLASH
EEPROM
Basic Architecture 26
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
DRAM
DDR
DDR2
DDR3
LP DDR
Spartan-6 Hard Memory Controller
• New Hard Block Memory Controller
– Up to 4 controllers per device
• Why a Hard Memory Block?
– Very common design component
– Multiple customer benefits
Customer Requests
Spartan-6 Hard Block Memory Controller
Benefits
Higher performance
• Up to 800 Mbps
Lower cost
• Saves soft logic, smaller die
Lower power
• Dedicated logic
Easier designs
• Timing closure no longer an issue
• Configurable MultiPort user interface
• CoreGen/MIG wizard & EDK support
Basic Architecture 27
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
A
18
A0
A1
18
48
PCOUT
CCOUT
BCOUT
Spartan-6 FPGA DSP48A1
Slice
36
D:A:B
MFOUT
18
18
C
12
18
18 X 18
M
C
48
 18x18 signed multiplier
 48-bit add/subtract/accumulate
 Pipeline registers for high speed
 Cascade paths for wide functions
 Pre-adder
0
P
48
P
Z
48
CIN
BCIN
OPMODE[6,4]
18
+/-
Basic Architecture 28
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
PCIN
B
Dual B, D
Register
With
Pre-adder
OPMODE[7]
18
CFOUT
X
OPMODE[5]
D
36 0
OPMODE[3:0]
18
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clock Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 29
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA Global
Clock Network
• 16 global clock buffers in the Spartan-6 FPGA
allow clocks to be distributed to potentially
every clocked element on the die
• 16 HCLK lines connect clock signals to logic
resources in each row
• HCLK lines can be driven by
– Global clock buffers
– DCM outputs
– PLL outputs
Basic Architecture 30
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA I/O Clock
Network
IO bank
BUFIO2
P N P N
P N P N
BUFPLL
IOLOGIC
IOLOGIC
IOLOGIC
IOLOGIC
CMT PLL
• Special clock network dedicated to I/O logical resources
– Independent of global clock resources
– Speeds up to 1 GHz
• Multiple sources for clocking I/O logic
– BUFIO2: for high-speed dedicated I/O clock signals
– BUFPLL: for clocks driven by the PLL in the CMT
Basic Architecture 31
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-6 FPGA Clock
Management Tile (CMT)
Clocks from BUFG
Feedback clocks from BUFIO2FB
GCLK Inputs
CLKIN
CLKOUT<5:0>
CLKFB
6
pll_clkout<5:0>
PLL
CLKIN
10
dcm1_clkout<9:0>
CLKOUT<9:0>
CLKFB
DCM
CLKIN
10
CLKOUT<9:0>
CLKFB
dcm2_clkout<9:0>
DCM
Basic Architecture 32
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Slice Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 33
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Designers Eccentrics
• Higher System Performance
– More design margin to simplify designs
– Higher integrated functionality
• Lower System Cost
– Reduce BOM
– Implement design in a smaller device & lower speed-grade
• Lower Power
– Help meet power budgets
– Eliminate heat sinks & fans
– Prevent thermal runaway
Basic Architecture 34
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Architecture Alignment
Virtex-6 FPGAs
760K
Logic Cell
Device
Spartan-6 FPGAs
Common Resources
150K
Logic Cell
Device
LUT-6 CLB
BlockRAM
DSP Slices
High-performance Clocking
FIFO Logic
Parallel I/O
Hardened Memory Controllers
Tri-mode EMAC
HSS Transceivers*
3.3 Volt compatible I/O
System Monitor
PCIe® Interface
*Optimized for target application in each family
Enables IP Portability, Protects Design Investments
Basic Architecture 35
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-6 and Spartan-6 FPGA
Sub-Families
Virtex-6
CXT FPGA
Virtex-6
LXT FPGA
• Upto 3.75Gbps serial connectivity • High Logic Density
• High-Speed Serial
and corresponding logic performance
Connectivity
Spartan-6
LXT FPGA
Spartan-6
LX FPGA
Virtex-6
SXT FPGA
• High Logic Density
• High-Speed Serial
Connectivity
• Enhanced DSP
Virtex-6
HXT FPGA
• High Logic Density
• Ultra High-Speed Serial
Connectivity
Logic
Block RAM
DSP
Parallel I/O
Serial I/O
• Lowest Cost Logic
• Lowest Cost Logic
• Low-Cost Serial Connectivity
Basic Architecture 36
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Slice Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 37
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex® Product & Process
Evolution
Virtex-6
Virtex-5
40-nm
65-nm
Virtex-4
90-nm
Virtex-II Pro
130-nm
Virtex-II
150-nm
Virtex-E
180-nm
Virtex
220-nm
1st Generation
2nd Generation
3rd Generation
4th Generation
5th Generation
6th Generation
Delivering Balanced Performance, Power, and Cost
Basic Architecture 38
Virtex-6 Base Platform
38
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Strong Focus on Power
Reduction
• Static Power Reduction
– Higher distribution of low leakage transistors
• Dynamic Power Reduction
– Reduced capacitance through device shrink
• Reduced Core Voltage Devices Lower Overall Power
– VCCINT = 0.9V option allows power / performance tradeoff
• I/O Power Improvements
– Dynamic termination
• System Monitor
– Allows sophisticated monitoring of temperature and voltage
Up to 50% Power Reduction vs. Previous Generation
Basic Architecture 39
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-6 Logic Fabric
• Virtex-6 Configurable Logic Block (CLB)
Slice
– Each CLB contains two slices
– Each slice contains four 6-input Lookup Tables (6LUT)
• Slices implement logic functions (slice_l)
• Slices for memories and shift registers (slice_m)
• LUT6 implements
–
–
–
–
All functions of up to 6 variables
Two functions of up to 5 or less variables each
Shift registers up to 32 stages long
Memories of 64 bits
• Multiple configurations within a slice
Power Consumption Benefits
Performance Benefits
• Shift register mode greatly reduces power
consumption over FF implementation
• Increased ratio of slice_m – memories
available closer to the source or target logic
LUT
LUT
Slice
LUT
LUT
LUT
LUT
LUT
LUT
Cost Benefits
• Can pack logic and memory functions more
efficiently
Basic Architecture 40
© 2011 Xilinx, Inc. All Rights Reserved
CLB
For Academic Use Only
Higher DSP Performance
• Most advanced DSP architecture
– New optional pre-adder for symmetric filters
– 25x18 multiplier
• High resolution filters
• Efficient floating point support
– ALU-like second stage enables mapping of advanced
operations
• Programmable op-code
• SIMD support
• Addition / Subtraction / Logic functions
– Pattern detector
• Lowest power consumption
• Highest DSP slice capacity
– Up to 2K DSP Slices
Basic Architecture 41
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Slice Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 42
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Power, Performance and Productivity
Drive Market Trends
Lower Power
Legislation and Regulations
Flat panel/TV, Central Office, Server Farms,
Portable Medical, Portable Consumer
Higher Performance
Wired Infrastructure, Wireless, Broadcast,
300G+ Networks, Aerospace and Defense,
High Performance Computing
System Capacity and Performance
All Market Segments
Improved Productivity
Reduce Capital and Operating Expenses
(OPEX, CAPEX)
#1 Customer Problem: Lower Power enables better Cost,
Performance, and Capability
Basic Architecture 43
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Unified Architecture Advantage
• Common elements enable easy IP reuse for quick
design portability across all 7 series families
– Design scalability from low-cost to high-performance
– Expanded eco-system support
– Quickest TTM
Logic Fabric
LUT-6 CLB
Precise, Low Jitter Clocking
MMCMs
On-Chip Memory
36Kbit/18Kbit Block RAM
Enhanced Connectivity
PCIe® Interface Blocks
DSP Engines
DSP48E1 Slices
Hi-perf. Parallel I/O Connectivity
SelectIO™ Technology
Artix™-7 FPGA
Kintex™-7 FPGA
Hi-performance Serial I//O Connectivity
Transceiver Technology
Virtex®-7 FPGA
Basic Architecture 44
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Xilinx 7 Series FPGAs
Industry’s First Unified Architecture
•
Industry’s Lowest Power and First Unified Architecture
– Spanning Low-Cost to Ultra High-End applications
•
Three new device families with breakthrough innovations in power efficiency,
performance-capacity and price-performance
Basic Architecture 45
Page 45
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-7 Sub-Families
• The Virtex-7 family has several sub-families
– Virtex-7:
– Virtex-7XT:
– Virtex-7HT:
General logic
Rich DSP and block RAM
Highest serial bandwidth
Virtex-7 FPGA
Virtex-7 XT FPGA
Virtex-7 HT FPGA
Logic
Block RAM
DSP
Parallel I/O
Serial I/O
• High Logic Density
• High-Speed Serial
Connectivity
• High Logic Density
• High-Speed Serial
Connectivity
• Enhanced DSP
• High Logic Density
• Ultra High-Speed Serial
Connectivity
Basic Architecture 46
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
Overview
Logic Resources
I/O Resources
Memory and DSP48
Clocking Resources
Latest Families
– Virtex-6 Family
– Virtex-7 Family
• Summary
Basic Architecture 47
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary
• The Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry
logic
– LUTs can perform any combinatorial function of up to six inputs
– LUTs are connected with dedicated multiplexers and carry logic
– Some LUTs can be configured as shift registers or memories
• The Spartan-6 FPGA IOBs contain DDR registers as well as SERDES
resources
• The SelectIO™ interfaces enable direct connection to multiple I/O standards
• The Spartan-6 FPGA includes dedicated block RAM and DSP slice resources
• The Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to
improve your system clock performance and generation capability
• Latest introduced families are architected for power efficiencies
– Consists of Artix, Kintex, and Virtex devices
Basic Architecture 48
© 2011 Xilinx, Inc. All Rights Reserved
For Academic Use Only