Spartan-6 Memory Resources Basic FPGA Architecture Xilinx Training Objectives After completing this module you will be able to… Fully utilize the Spartan-6 distributed and block memory resources Understand the features and limitations of the Spartan-6 dedicated memory controller block (MCB) Use the Memory Interface Generator (MIG) to build your custom memory controller and design an appropriate interface to your off-chip memory component Every Design Uses Memory On-chip block RAM and ROM Fast on-chip memory – Frequently used for… • Finite State Machines • FIFO Spartan-6 • Large/local data storage External memory for larger data storage – Dedicated Memory Controller Block (MCB) – Supports evolving controller standards All memory solutions – Must be fast and flexible – Bitstream can pre-loaded with fixed data for a ROM or RAM External memory Memory Options Distributed RAM/SRL32 RAM / SRL 32 LOGIC On-chip Block RAM • Very granular, localized memory • Minimal impact on logic routing • Great for small FIFOs Granularity BRAM • Efficient, on-chip blocks • Ideal for mid-sized buffering Fast Memory Interfaces Spartan-6 DRAM DRAM • SDRAM • DDR • DDR3 • FCRAM SRAM • RLDRAM SRAM • Sync SRAM FLASH • DDR SRAM • ZBT • QDR EEPROM FLASH EEPROM • Cost-effective bulk storage • Various memory controller cores • For large memory requirements • Memory Controller Block Capacity Interfacing to External Memories MCB supports many standards – 1.5 V to 2.5 V – Single or double data rate – Different protocols High performance – With advanced ChipSync™ technology • This is the Select I/O functionality that enables the FPGA to be directly connected to memory specific I/O standards Spartan-6 Distributed RAM Distributed LUT memory – 64-bit blocks throughout the FPGA available in 25% of the slices – Single-port, dual-port, multi-port – Can be used as 32-bit shift register – Very fast (sub-nanosecond) Ideal for small and fast memories – Coefficient storage – Small data buffers – Small state machines – Small FIFOs – Shift registers Slice3 Slice3 Logic Logic RAM Shift Register Slice3 Slice3 Logic Slice3 Logic Logic RAM Shift Register Slice3 Logic RAM Shift Register Slice3 Slice3 Logic Logic RAM Shift Register Slice3 Slice3 Logic Slice3 Logic R A M R A M Logic RAM Shift Register Slice3 Logic RAM Shift Register R A M Distributed RAM Features Distributed LUT memory – Can be loaded by configuration Synchronous (clocked) write operation – But asynchronous (combinatorial) read • Can make synchronous read when you use the neighboring flip-flop Slice3 Slice3 Logic Logic RAM Shift Register Slice3 Slice3 Slice3 Logic RAM Shift Register Slice3 Slice3 Logic Logic Slice3 Logic Logic RAM Shift Register Slice3 Logic RAM Shift Register Slice3 Logic Logic Slice3 Logic R A M R A M Logic RAM Shift Register Slice3 Logic RAM Shift Register Easiest to build with the Core Generator – Automatically builds the necessary input decode and output multiplexer logic Memories can be initialized – Text I/O code in VHDL – Coefficient file – Or an initialization file placed in HDL code if inferring the memory R A M Block RAM Features Multiple configuration options – True dual-port, simple dual-port, single-port 18Kb Memory Two independent ports address common data – Individual address, clock, write enable, clock enable – Independent widths for each port Dual-Port 300-MHz operation when using data pipeline optionBRAM – All operations are synchronous; all outputs are latched – Data output has an optional internal pipeline register • Faster clock rate, but increased latency Byte-write enable – Enhances processor memory interfacing Load block RAM during configuration – Reset during operation clears the registers, not the data content Do not violate address setup time while enabled – Disable memory when address timing might be unpredictable – This also saves power Spartan-6 FPGA Block RAM Block Each block RAM block can be used as 18 Kb Block RAM • One 18 Kb Block RAM 9 Kb Block RAM or 9 Kb Block RAM • Two independent 9 Kb block RAMs True Dual-Port Block RAM True dual-port flexibility – Can perform read and write operations simultaneously and independently on port A and port B – Each port has its own clock, enable, and write enable – Every write also performs a read operation Addr A Wdata A 36 Rdata A 18 Kb Memory Array • Read before write, write before read, or no output change – Simultaneous read + write or write + write to the same location can cause data corruption Port A 36 Addr B 36 Wdata B Port B Rdata B • Make sure that the address and control signals are stable during operation Block RAM configurations – One block RAM: 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36 – Or two independent 9K block RAMs: 8Kx1, 4Kx2, 2Kx4, 1Kx9, 512x18 – Each port can have its own depth x width within that range 36 Simple Dual-Port Block RAM One read port and one write port – Natural structure for FIFOs Addr A Port A 36 Wdata A Allows widest implementation • Up to a 72-bit read and write in one cycle – Doubles the memory bandwidth per block “Parity bits” are not dedicated for parity – All byte-wide data has an extra ninth storage bit – Can be used for parity or for any other purpose – Parity generation / checking would need LUT logic Rdata A 18 Kb Memory Array – 72-bit data width on 18K block RAM – 36-bit data width for 9K block RAM 36 Addr B 36 Wdata B Port B Rdata B 36 Block RAM Configurations Each 9 K 18 K True dual-port 8Kx1, 4Kx2, 2Kx4, 1Kx9, 512x18 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36 Two fully independent read and write operations Simple dual-port 8Kx1, 4Kx2, 2Kx4, 1Kx9, 512x18, 256x36 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36, 256x72 1 read & 1 write port Read AND write in 1 cycle Single-port 8Kx1, 4Kx2, 2Kx4, 1Kx9, 512x18, 256x36 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36, 256x72 1 read & 1 write port Read OR write in 1 cycle Byte-Wide Write Enable Controls which byte is being written – One write enable for each byte of data and its parity bit – Useful when interfacing with processors clk address a0 data ABCD we[3:0] memory @a0 output Byte-write operation during write-first mode 1001 FFFF AFFD AFFD – Bytes not being written will show an undefined value on the output – The output truly reflects the new memory content – Writing always implies a read • Be careful of reading when writing Spartan-6 FPGA Memory Capacity Spartan-6 Device Distr. RAM (Kb) Block RAM (Kb) 18-Kb Block RAM Blocks XC6SLX4 32 144 8 XC6SLX9 90 576 32 XC6SLX16 136 576 32 XC6SLX25 229 936 52 XC6SLX45 401 2,088 116 XC6SLX100 930 4,824 268 XC6SLX150 1,355 4,824 268 XC6SLX25T 229 936 52 XC6SLX45T 401 2,088 116 XC6SLX100T 930 4,824 268 XC6SLX150T 1,355 4,824 268 Memory Controller Block (MCB) Spartan-6 has a New dedicated memory controller block – Up to four controllers per device – Saves between 500 and 2000 LUTs and registers versus a soft implementation MCB Blk DDR DDR2 DDR3 LP DDR Why a hard block? – Very common design component – Benefits of a hard block • Higher performance: 800 Mbps • Lower cost: smaller than soft logic • Lower power: compared to soft logic – Easy to design • Abstracts away complexity of memory interfacing • CORE Generator™ tool / MIG wizard and EDK support Interface Spartan-6 FPGA DDR3 SDRAM 800 Mbps* DDR2 SDRAM 800 Mbps* DDR SDRAM 400 Mbps* LP DDR 400 Mbps* *For all speed grades, except -1L MCB Features Memory support – DDR, DDR2, DDR3, LP DDR standards – Six 32-bit wide user ports • Can be concatenated up to 128 bits • Each port has 64-deep data FIFO and 4-deep command FIFO Simple… but also programmable – Controller options • Set user interface, calibration, addressing, and arbitration schemes – Memory device options • Control features and timing parameters Automatic calibration – DQS centering – DQ per-bit de-skew – FPGA on-chip input termination CMD FIFO 0 CMD FIFO 1 CMD FIFO 2 CMD FIFO 3 CMD FIFO 4 CMD FIFO 5 Arbiter Controller 32-bit Bi-directional 32-bit Bi-directional 32-bit Uni-directional 32-bit Uni-directional 32-bit Uni-directional 32-bit Uni-directional Data Path Dedicated Routing Simple, multi-port user interface Memory Controller Block PHY MCB Options Number of memory controllers – 2 for medium-sized devices, 4 for the two largest devices I/O interface to a single external DRAM device: 4, 8, or 16 bits wide Internal “user interface” bus width is programmable: 32 to 128 bits wide bus LP DDR DRAM size DRAM width x16 x8 128Mb x4 x16 x8 256Mb x4 x16 512Mb x8 x4 x16 x8 1Gb x4 x16 2Gb x8 x4 4Gb x16 DDR DDR2 DDR3 LX4 M C B 3 M C LX16 B 1 M C B 4 M C B 3 LX100/T M M C C B LX9 B 3 1 M C B 3 LX25/T M C B 5 M C B 4 M C B 1 M C B 3 M C B 1 M C B 3 M C B 1 LX45/T M C B 5 LX150/T M C B 1 MCB Performance Memory Type Max Theoretical Bandwidth per Memory Controller Interface Data Rate Min DDR TBD * DDR2 TBD * DDR3 TBD * LPDDR TBD * Max 400 Mbps (200 MHz) 800 Mbps (400 MHz) 800 Mbps (400 MHz) 400 Mbps (200 MHz) 4-bit 8-bit 16-bit 1.6 Gbps 3.2 Gbps 6.4 Gbps 3.2 Gbps 6.4 Gbps 12.8 Gbps 3.2 Gbps 6.4 Gbps 12.8 Gbps 1.6 Gbps 3.2 Gbps 6.4 Gbps Higher data rates than with soft-core implementations – Data rates up to 800 Mbps (DDR2, DDR3) – Maximum theoretical bandwidth up to 12.8 Gbps – Max values are for all speed grades in standard voltage devices Block Diagram Spartan-6 FPGA p0_cmd_full p0_cmd_empty Calibration Module (RTL) Spartan-6 FPGA Memory Controller Block CMD FIFO 0 CMD FIFO 1 CMD FIFO 2 CMD FIFO 3 CMD FIFO 4 CMD FIFO 5 Arbiter p0_rd_clk p0_rd_en p0_rd_data p0_rd_empty p0_rd_full p0_rd_overflow p0_rd_count p0_rd_error p0_wr_clk p0_wr_en p0_wr_data p0_wr_mask p0_wr_empty p0_wr_full p0_wr_underrun p0_wr_count p0_wr_error 32-bit Bi -directional 32-bit Bi -directional 32-bit Uni -directional 32-bit Off-Chip Memory Controller Data Path Dedicated Routing p0_cmd_clk p0_cmd_en p0_cmd_bl p0_cmd_instr p0_cmd_addr IP Wrapper I/O Clock Network User Interface P H Y I O B mcbx_dram_clk mcbx_dram_clk_n mcbx_dram_cke mcbx_dram_ras_n mcbx_dram_cas_n mcbx_dram_we_n mcbx_dram_odt mcbx_dram_ddr3_rst mcbx_dram_ba mcbx_dram_addr mcbx_dram_dq mcbx_dram_dqs mcbx_dram_dqs_n mcbx_dram_udm mcbx_dram_ldm Uni -directional 32-bit Uni- directional 32-bit Uni -directional Simple user interface abstracts away complexity MIG / EDK wrapper delivers complete interface solution – Internal block assembly and signal connectivity is transparent to the user Design Considerations PLL creates two phases of MCB system clock User Interface User Clks Controller Arbiter Data Path IOB MCB Block PHY Layer – SYSCLK_2X & SYSCLK_2X_180 – Operate at 2X memory clock frequency • Used for DDR I/O • Divide by 2 in MCB creates memory clock frequency for other logic (1X clocks) – Place MCBs on same side of device must share system clocks = same data rate Memory Interface 1X Clks : 2 :2 PLL CLK CLK IN CLKB IBUFDS FB CLK SYSCLK_2X OUT0 2X Clks CLK SYSCLK_2X_180 OUT1 BUFPLL_MCB User interface clocks – Port clocks for command, write, and read path – Asynchronous to system clocks – FIFOs handle clock domain transfer IO Clock Network 2nd MCB Block On the same side of device (Only in larger parts) Clock Example: DDR2 800 Mbps 2X clk = 800 MHz 1X clk = 400 MHz Design Considerations I/O pins for MCBs are predefined – But are general-purpose I/O when a particular MCB is not used Package selection determines access to MCBs – Higher pin count packages have more MCB blocks bonded out Migration across devices within same package – Up or down one device density in most cases – Applies only within a device family (LX or LXT, for example) The left and lower left MCB has the best migration path – MCB pins shared less with other functions compared to right side Design Considerations MCB blocks can be connected in parallel to create wider interfaces This will require extra CLB logic MCB blocks interface to a SINGLE memory device (x4, x8, or x16) There is No support for two x8 MC interfacing to a x16 memory Even for LPDDR, use an external VREF supply in the bank Supports soft calibration module input termination tuning Use a PLL nearest the center of the device to drive the BUFPLL_MCB Two MCB Design Flows For ISE® tool design flow – Memory Interface Generator (MIG) wizard within the CORE Generator tool – Best for non-embedded applications – Simple GUI-driven tool for configuring MCB block – Supports all MCB memory standards (DDR3, DDR2, DDR, LPDDR) For EDK design flow – Multi Port Memory Controller (MPMC) – Best for Embedded applications – MCB block is underlying hardware implementation of the MPMC peripheral – Simple GUI-driven tool within EDK / XPS – Supports all MCB memory standards (DDR3, DDR2, DDR, LPDDR) Memory Interface Generator (MIG) Easy to customize your memory controller and interface design – Memory architecture, data rate, bus width – CAS latency, burst length – I/O bank assignments – Generates RTL source code and UCF from hardware-verified IP – Delivered as part of the ISE software (CORE Generator utility) – MCB supports DDR, DDR2, DDR3, and LPDDR Soft Memory Controller supports other additional memory interfaces MIG Design Flow Open CORE Generator™ Run MIG, choose your memory parameters and generate rtl and ucf files Integrate MIG .ucf constraints to overall design constraints file Import RTL and build options into ISE project Customize MIG design Integrate/customize MIG memory RTL testbench Perform functional simulation Optional RTL customization Synthesize design Place and route design Timing simulation Verify in hardware MIG Enables you to customize your memory controller – Some options are specific to the memory controller standard (such as DDR2 and DDR3) Options for the physical layer and the FPGA controller – Debug signals – IOB options for power or speed MIG bank selection options – Displays pins required – Restricted I/O columns are disabled MIG Output Files UCF file folder – Pinout and clocking constraints – Batch file (ise_flow.bat) with recommended build options RTL file folder – Functional modules (physical layer, user interface, controller, testbench) – Unencrypted for ease of customization Simulation file folder – HDL simulation files including memory device models Synthesis files folder Summary Distributed LUT RAM Distributed LUT RAM – Fast, localized memories – Great for small FIFOs Block RAM – Bigger on-chip memories High-Performance Block RAM – Great for mid-sized buffering FPGA Dedicated Memory Controller (MCB) – Fast connection to popular standard RAMs – Memory controller cores – Ideal for large memory requirements Memories can be built with the Core Generator or Memory Interface Generator (MIG) External Memory Interfacing Where Can I Learn More? User Guides – Spartan-6 FPGA User Guide • Describes the complete FPGA architecture, including distributed memory, block memory and the MCB – Sparfan-6 FPGA Memory Controller User Guide • Detailed description of all MCB functionality Xilinx Education Services courses – www.xilinx.com/training • Xilinx tools and architecture courses • Hardware description language courses • Basic FPGA architecture, Basic HDL Coding Techniques, and other Free videos! Trademark Information Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.