Basic FPGA Architectures This material exempt per Department of Commerce license exception TSU © 2011 Xilinx, Inc. All Rights Reserved Objectives After completing this module, you will be able to: • • • • Describe the basic slice resources available in Spartan-6 FPGAs Identify the basic I/O resources available in Spartan-6 FPGAs List some of the dedicated hardware features of Spartan-6 FPGAs Differentiate the Virtex-6 family of devices from the Spartan-6 family • Identify latest members of Virtex-7 device family Basic Architecture 2 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 3 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Overview • All Xilinx FPGAs contain the same basic resources – Logic Resources • Slices (grouped into CLBs) – Contain combinatorial logic and register resources • Memory • Multipliers – Interconnect Resources • Programmable interconnect • IOBs – Interface between the FPGA and the outside world – Other resources • Global clock buffers • Boundary scan logic Basic Architecture 4 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA CLB Memory Controller I/O CMT MGT BUFG PCIe Endpoint BUFIO Block RAM DSP48 Basic Architecture 5 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 Lowest Total Power • 45 nm technology • Static power reductions – Process & architectural innovations • Dynamic power reduction – Lower node capacitance & architectural innovations • More hard IP functionality – Integrated transceivers & other logic reduces power – Hard IP uses less current & power than soft IP • • • • Lower IO power Low power option -1L reduces power even further Fewer supply rails reduces power Two families: LX and LXT Basic Architecture 6 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 LX / LXT FPGAs ** All memory controller support x16 interface, except in CS225 package where x8 only is supported Basic Architecture 7 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 8 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA CLB COUT Slice1 Switch Matrix • CLB contains two slices • Connected to switch matrix for routing to other FPGA resources • Carry chain runs vertically through Slice0 only Slice0 CIN Basic Architecture 9 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Three Types of Slices in Spartan-6 FPGAs • SLICEM: Full slice – LUT can be used for logic and memory/SRL – Has wide multiplexers and carry chain SLICEX SLICEM • SLICEL: Logic and arithmetic only – LUT can only be used for logic (not memory) – Has wide multiplexers and carry chain or SLICEX • SLICEX: Logic only – LUT can only be used for logic (not memory) – No wide multiplexers or carry chain SLICEL Basic Architecture 10 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 CLB Logic Slices SliceM (25%) LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic SliceL (25%) LUT6 8 Registers Carry Logic Wide Function Muxes SliceX (50%) LUT6 Optimized for Logic 8 Registers Slice mix chosen for the optimal balance of Cost, Power & Performance Basic Architecture 11 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA SLICE • Four LUTs • Eight storage elements – Four flip-flop/latches – Four flip-flops LUT/RAM/SRL • F7MUX and F8MUX – Connects LUT outputs to create wide functions – Output can drive the flip-flop/latches • Carry chain (Slice0 only) LUT/RAM/SRL LUT/RAM/SRL – Connected to the LUTs and the four flip-flop/latches LUT/RAM/SRL 01 Basic Architecture 12 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only 6-Input LUT with Dual Output • 6-input LUT can be two 5-input LUTs with common inputs – Minimal speed impact to a 6-input LUT A6 – One or two outputs A5 – Any function of six variables or A4 two independent functions of A3 A2 five variables A1 6-LUT A5 A4 A3 A2 A1 D 5-LUT O6 A5 A4 A3 A2 A1 D 5-LUT Basic Architecture 13 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only O5 Slice Flip-Flop and FlipFlop/Latch Control D CE – This is referred to as the “control set” of the flipflops – CE and SR are active high – CLK can be inverted at the slice boundary – All four flip-flop/latches are configured the same – All four flip-flops are configured the same • SR will cause the flip-flop to be set to the state specified by the SRINIT attribute CK DFF D Q CE CK SR For Academic Use Only Q CE CK SR Basic Architecture 14 © 2011 Xilinx, Inc. All Rights Reserved D SR ●●● • Set/Reset (SR) signal can be configured as synchronous or asynchronous Q AFF/LATCH ●●● • All flip-flops and flip-flop/latches share the same CLK, SR, and CE signals AFF DFF/LATCH D Q CE CK SR Configuring LUTs as a Shift Register (SRL) LUT D CE CLK LUT A[4:0] D CE Q D CE Q D CE Q D CE Q Q Q31 (cascade out) Basic Architecture 15 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Shift Register LUT Example 20 Cycles 64 Operation A Operation B 8 Cycles 12 Cycles Operation C Operation D - NOP 3 Cycles 17 Cycles 64 Paths are Statically Balanced 20 Cycles • Operation D - NOP must add 17 pipeline stages of 64 bits each – 1,088 flip-flops (hence 136 slices) or – 64 SRLs (hence 16 slices) Basic Architecture 16 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 17 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only I/O Block Diagram Electrical Resources Master IOLOGIC IOSERDES P IODELAY LVDS Termination Slave IOLOGIC IOSERDES IODELAY Interconnect to FPGA fabric Logical Resources N Basic Architecture 18 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA Supports 40+ Standards • • • • • • • Each input can be 3.3 V compatible LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V) LVCMOS_JEDEC LVPECL (3.3 V, 2.5 V) PCI I2C* HSTL (1.8 V, 1.5 V; Classes I, II, III, IV) – DIFF_HSTL_I, DIFF_HSTL_I_18 – DIFF_HSTL_II* • SSTL (2.5 V, 1.8 V; Classes I, II) – DIFF_SSTL_I, DIFF_SSTL18_I – DIFF_SSTL_II* • • LVDS, Bus LVDS RSDS_25 (point-to-point) Easier and More Flexible I/O Design! * Newly added standards Basic Architecture 19 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA I/O Bank Structure • All I/Os are on the edges of the chip • I/Os are grouped into banks BANK 0 BANK 3 – 30 ~ 83 I/O per banks – Eight clock pins per edge – Common VCCO, VREF BANK 1 BANK 2 Chip View (LX45/T and Smaller) • Restricts mixture of standards in one bank • The differential driver is only available in Bank0 and Bank2 – Differential receiver is available in all banks – On-chip termination is available in all banks BANK 0 BANK 4 BANK 5 BANK 3 BANK 1 BANK 2 Chip View (LX100/T and Larger) Basic Architecture 20 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only • Two IOLOGIC block per I/O pair • Each IOLOGIC contains – IOSERDES • Parallel to serial converter (serializer) • Serial to parallel converter (De-serializer) • Selectable fine-grained delay – SDR and DDR resources IOLOGIC IOSERDES Slave IOLOGIC IOSERDES IODELAY – IODELAY Master IODELAY – Master and slave – Can operate independently or concatenated Interconnect to FPGA Fabric I/O Logical Resources Basic Architecture 21 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 22 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only SLICEM Used as Distributed SelectRAM Memory Single Port 32x2 32x4 32x6 32x8 64x1 64x2 64x3 64x4 128x1 128x2 256x1 Dual Port 32x2D 32x4D 64x1D 64x2D 128x1D Simple Dual Port 32x6SDP 64x3SDP Quad Port • • Uses the same storage that is used for the look-up table function Synchronous write, asynchronous read – Can be converted to synchronous read using the flip-flops available in the slice 32x2Q 64x1Q • Various configurations – Single port • One LUT6 = 64x1 or 32x2 RAM • Cascadable up to 256x1 RAM – Dual port (D) • 1 read / write port + 1 read-only port – Simple dual port (SDP) • 1 write-only port + 1 read-only port – Quad-port (Q) Each port has independent address inputs • 1 read / write port + 3 read-only ports Basic Architecture 23 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA Block RAM Features • 18 kb size 18k Memory – Can be split into two independent 9-kb memories • Performance up to 300 MHz • Multiple configuration options Dual-Port BRAM – True dual-port, simple dual-port, single-port • Two independent ports access common data – Individual address, clock, write enable, clock enable – Independent widths for each port • Byte-write enable Basic Architecture 24 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Better, More BRAM • More Block RAMs – 2x higher BRAM to Logic Cell ratio than Spartan-3A platform • More port flexibility 9K BRAM 18K BRAM – 18K can be split into two 9K BRAM blocks and can be independently addressed • Improves buffering, caching & data storage – Excellent for embedded processing, communication protocols – Enables DSP blocks to provide more efficient video and surveillance algorithms • Lower Static Power Basic Architecture 25 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only OR 9K BRAM Memory Controller • Only low cost FPGA with a “hard” memory controller • Guaranteed memory interface performance providing – Reduced engineering & board design time – DDR, DDR2, DDR3 & LP DDR support – Up to 12.8Mbps bandwidth for each memory controller • Automatic calibration features DRAM • Multiport structure for user interface – Six 32-bit programmable ports from fabric – Controller interface to 4, 8 or 16 bit memories devices SRAM Spartan-6 FLASH EEPROM Basic Architecture 26 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only DRAM DDR DDR2 DDR3 LP DDR Spartan-6 Hard Memory Controller • New Hard Block Memory Controller – Up to 4 controllers per device • Why a Hard Memory Block? – Very common design component – Multiple customer benefits Customer Requests Spartan-6 Hard Block Memory Controller Benefits Higher performance • Up to 800 Mbps Lower cost • Saves soft logic, smaller die Lower power • Dedicated logic Easier designs • Timing closure no longer an issue • Configurable MultiPort user interface • CoreGen/MIG wizard & EDK support Basic Architecture 27 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only A 18 A0 A1 18 48 PCOUT CCOUT BCOUT Spartan-6 FPGA DSP48A1 Slice 36 D:A:B MFOUT 18 18 C 12 18 18 X 18 M C 48 18x18 signed multiplier 48-bit add/subtract/accumulate Pipeline registers for high speed Cascade paths for wide functions Pre-adder 0 P 48 P Z 48 CIN BCIN OPMODE[6,4] 18 +/- Basic Architecture 28 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only PCIN B Dual B, D Register With Pre-adder OPMODE[7] 18 CFOUT X OPMODE[5] D 36 0 OPMODE[3:0] 18 Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clock Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 29 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA Global Clock Network • 16 global clock buffers in the Spartan-6 FPGA allow clocks to be distributed to potentially every clocked element on the die • 16 HCLK lines connect clock signals to logic resources in each row • HCLK lines can be driven by – Global clock buffers – DCM outputs – PLL outputs Basic Architecture 30 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA I/O Clock Network IO bank BUFIO2 P N P N P N P N BUFPLL IOLOGIC IOLOGIC IOLOGIC IOLOGIC CMT PLL • Special clock network dedicated to I/O logical resources – Independent of global clock resources – Speeds up to 1 GHz • Multiple sources for clocking I/O logic – BUFIO2: for high-speed dedicated I/O clock signals – BUFPLL: for clocks driven by the PLL in the CMT Basic Architecture 31 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Spartan-6 FPGA Clock Management Tile (CMT) Clocks from BUFG Feedback clocks from BUFIO2FB GCLK Inputs CLKIN CLKOUT<5:0> CLKFB 6 pll_clkout<5:0> PLL CLKIN 10 dcm1_clkout<9:0> CLKOUT<9:0> CLKFB DCM CLKIN 10 CLKOUT<9:0> CLKFB dcm2_clkout<9:0> DCM Basic Architecture 32 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Slice Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 33 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Designers Eccentrics • Higher System Performance – More design margin to simplify designs – Higher integrated functionality • Lower System Cost – Reduce BOM – Implement design in a smaller device & lower speed-grade • Lower Power – Help meet power budgets – Eliminate heat sinks & fans – Prevent thermal runaway Basic Architecture 34 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Architecture Alignment Virtex-6 FPGAs 760K Logic Cell Device Spartan-6 FPGAs Common Resources 150K Logic Cell Device LUT-6 CLB BlockRAM DSP Slices High-performance Clocking FIFO Logic Parallel I/O Hardened Memory Controllers Tri-mode EMAC HSS Transceivers* 3.3 Volt compatible I/O System Monitor PCIe® Interface *Optimized for target application in each family Enables IP Portability, Protects Design Investments Basic Architecture 35 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-6 and Spartan-6 FPGA Sub-Families Virtex-6 CXT FPGA Virtex-6 LXT FPGA • Upto 3.75Gbps serial connectivity • High Logic Density • High-Speed Serial and corresponding logic performance Connectivity Spartan-6 LXT FPGA Spartan-6 LX FPGA Virtex-6 SXT FPGA • High Logic Density • High-Speed Serial Connectivity • Enhanced DSP Virtex-6 HXT FPGA • High Logic Density • Ultra High-Speed Serial Connectivity Logic Block RAM DSP Parallel I/O Serial I/O • Lowest Cost Logic • Lowest Cost Logic • Low-Cost Serial Connectivity Basic Architecture 36 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Slice Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 37 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex® Product & Process Evolution Virtex-6 Virtex-5 40-nm 65-nm Virtex-4 90-nm Virtex-II Pro 130-nm Virtex-II 150-nm Virtex-E 180-nm Virtex 220-nm 1st Generation 2nd Generation 3rd Generation 4th Generation 5th Generation 6th Generation Delivering Balanced Performance, Power, and Cost Basic Architecture 38 Virtex-6 Base Platform 38 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Strong Focus on Power Reduction • Static Power Reduction – Higher distribution of low leakage transistors • Dynamic Power Reduction – Reduced capacitance through device shrink • Reduced Core Voltage Devices Lower Overall Power – VCCINT = 0.9V option allows power / performance tradeoff • I/O Power Improvements – Dynamic termination • System Monitor – Allows sophisticated monitoring of temperature and voltage Up to 50% Power Reduction vs. Previous Generation Basic Architecture 39 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-6 Logic Fabric • Virtex-6 Configurable Logic Block (CLB) Slice – Each CLB contains two slices – Each slice contains four 6-input Lookup Tables (6LUT) • Slices implement logic functions (slice_l) • Slices for memories and shift registers (slice_m) • LUT6 implements – – – – All functions of up to 6 variables Two functions of up to 5 or less variables each Shift registers up to 32 stages long Memories of 64 bits • Multiple configurations within a slice Power Consumption Benefits Performance Benefits • Shift register mode greatly reduces power consumption over FF implementation • Increased ratio of slice_m – memories available closer to the source or target logic LUT LUT Slice LUT LUT LUT LUT LUT LUT Cost Benefits • Can pack logic and memory functions more efficiently Basic Architecture 40 © 2011 Xilinx, Inc. All Rights Reserved CLB For Academic Use Only Higher DSP Performance • Most advanced DSP architecture – New optional pre-adder for symmetric filters – 25x18 multiplier • High resolution filters • Efficient floating point support – ALU-like second stage enables mapping of advanced operations • Programmable op-code • SIMD support • Addition / Subtraction / Logic functions – Pattern detector • Lowest power consumption • Highest DSP slice capacity – Up to 2K DSP Slices Basic Architecture 41 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Slice Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 42 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Power, Performance and Productivity Drive Market Trends Lower Power Legislation and Regulations Flat panel/TV, Central Office, Server Farms, Portable Medical, Portable Consumer Higher Performance Wired Infrastructure, Wireless, Broadcast, 300G+ Networks, Aerospace and Defense, High Performance Computing System Capacity and Performance All Market Segments Improved Productivity Reduce Capital and Operating Expenses (OPEX, CAPEX) #1 Customer Problem: Lower Power enables better Cost, Performance, and Capability Basic Architecture 43 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only The Unified Architecture Advantage • Common elements enable easy IP reuse for quick design portability across all 7 series families – Design scalability from low-cost to high-performance – Expanded eco-system support – Quickest TTM Logic Fabric LUT-6 CLB Precise, Low Jitter Clocking MMCMs On-Chip Memory 36Kbit/18Kbit Block RAM Enhanced Connectivity PCIe® Interface Blocks DSP Engines DSP48E1 Slices Hi-perf. Parallel I/O Connectivity SelectIO™ Technology Artix™-7 FPGA Kintex™-7 FPGA Hi-performance Serial I//O Connectivity Transceiver Technology Virtex®-7 FPGA Basic Architecture 44 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only The Xilinx 7 Series FPGAs Industry’s First Unified Architecture • Industry’s Lowest Power and First Unified Architecture – Spanning Low-Cost to Ultra High-End applications • Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance Basic Architecture 45 Page 45 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-7 Sub-Families • The Virtex-7 family has several sub-families – Virtex-7: – Virtex-7XT: – Virtex-7HT: General logic Rich DSP and block RAM Highest serial bandwidth Virtex-7 FPGA Virtex-7 XT FPGA Virtex-7 HT FPGA Logic Block RAM DSP Parallel I/O Serial I/O • High Logic Density • High-Speed Serial Connectivity • High Logic Density • High-Speed Serial Connectivity • Enhanced DSP • High Logic Density • Ultra High-Speed Serial Connectivity Basic Architecture 46 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • Overview Logic Resources I/O Resources Memory and DSP48 Clocking Resources Latest Families – Virtex-6 Family – Virtex-7 Family • Summary Basic Architecture 47 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only Summary • The Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry logic – LUTs can perform any combinatorial function of up to six inputs – LUTs are connected with dedicated multiplexers and carry logic – Some LUTs can be configured as shift registers or memories • The Spartan-6 FPGA IOBs contain DDR registers as well as SERDES resources • The SelectIO™ interfaces enable direct connection to multiple I/O standards • The Spartan-6 FPGA includes dedicated block RAM and DSP slice resources • The Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to improve your system clock performance and generation capability • Latest introduced families are architected for power efficiencies – Consists of Artix, Kintex, and Virtex devices Basic Architecture 48 © 2011 Xilinx, Inc. All Rights Reserved For Academic Use Only