Virtex-5 FPGA Coding Techniques Part 1 FPGA and ASIC Technology Comparison - 1 © 2009 Xilinx, Inc. All Rights Reserved Intro to VHDL or Intro to Verilog 3 days FPGA and ASIC Technology Comparison Curriculum Path FPGA vs. ASIC Design Flow ASIC to FPGA Coding Conversion Virtex-5 FPGA Coding Techniques Spartan-3 FPGA Coding Techniques Fundamentals of FPGA Design Designing for Performance for 1 day ASIC Design 2 days Advanced FPGA Implementation 2 days Welcome This REL will help you build efficient Virtex®-5 FPGA designs that have an efficient size and run at high speed We will show you how to avoid some of the most common design mistakes This content is essential if you have never coded a design for the Virtex-5 FPGA or are converting an ASIC design After completing this module, you will able to: Optimize ASIC code for implementation in a Virtex-5 FPGA Build a checklist of tips for optimizing your code for the Virtex-5 FPGA Introduction There is no single “perfect” way to create a design Different synthesis options and implementation options will lead to different results • One method will NOT work best in all cases The coding techniques described here are strongly recommended because they have the biggest impact on device utilization and speed There are however guidelines that usually lead to improved results FPGA and ASIC Technology Comparison - 5 © 2009 2007 Xilinx, Inc. All Rights Reserved Tactics to Meet Timing As always, use as many of the dedicated resources as possible (SRLs, DSP48s, and block RAMs) Different tactics must be used when your device is full Timing does not matter if your design does not fit in the device The tactics that will be discussed generally work best in designs that are not full One of the most effective ways to reduce power in FPGAs is to reduce the number of resources One of the side benefits of these techniques is that they will allow you to improve performance and reduce power FPGA and ASIC Technology Comparison - 6 © 2009 2007 Xilinx, Inc. All Rights Reserved Limiting Virtex-5 FPGA Resources Build a design that uses fewer “limiting” resources Fewer registers • • Many designs run out of registers before other components (especially if the design is heavily pipelined) Registers are most often the limiting resource in Virtex-5 designs Fewer LUTs • The LUT6 is 40 percent more efficient than a LUT4 – But this does not yield performance benefits for every design FPGA and ASIC Technology Comparison - 7 © 2009 2007 Xilinx, Inc. All Rights Reserved Virtex-5 FPGA Registers Why are registers sometimes a scarce resource? The Virtex-5 FPGA has ~30 percent fewer registers for a given logic or array size compared to the Virtex-4 FPGA • • The 4VLX80 device has 71,680 slice flip-flops versus 51,840 for a 5VLX85 device So you should NOT need to pipeline as much! Be aware of control signal limitations (this will be covered later) Lack of use (inference) of SRLs, block RAMs, and DSP slice resources Replication of registers (logic replication) • Careful use of synthesis options that may increase your design size is important FPGA and ASIC Technology Comparison - 8 © 2009 2007 Xilinx, Inc. All Rights Reserved Introduction to Control Sets A control signal is Clock Enable / Gate Enable Write Enable Set / Preset Reset / Clear Clock / Gate Slice: WE, CE, SR, REV, CLK A control set is A group of enable, set, reset, and clock • This includes Vcc / Gnd when they are not used Unique control sets are The number of groups of unique control signals that your design has Tip: The implementation tools© 2009 cannot group flip-flops into the same slice if FPGA and ASIC Technology 2007 Xilinx, Inc. All Rights Reserved Comparison - 9 they do not share the same control signals What Creates Control Signals? Control signals are the signals that are connected to the actual control ports on the register Inference code Clocks and asynchronous set/resets always become control signals • They cannot be moved to the datapath Clock enables and synchronous set/resets sometimes become control signals (this is decided by the synthesis tool) • These control signals can be moved to the datapath How will a global asynchronous reset and a local reset inferred on a single register be implemented? • • Asynchronous reset gets the port on the register Synchronous reset gets a LUT input Tip: Clock enables and synchronous sets and resets can be moved to the FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 10 datapath What Creates Control Signals? Instantiation of primitives and cores Gate-level connection of UNISIM and core primitives dictates control signal usage Synthesis optimization Synthesis may choose to build a control signal for logic optimization Physical synthesis Can change control sets from original specifications Global or logic optimization may choose to build a control signal for logic optimization Tip: The instantiations of cores you make should share the same control FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 11 signals you infer to minimize the number of control sets Why Be Concerned? Four registers per slice; all share the same control signals If the number of registers in the control set do not divide cleanly by four, some registers must go unused This is of concern for designs that have several very low fanout control signals A design with a large number of control sets potentially can show lower utilization of registers (but not always) Tip: Try to build in byte-wide ©widths for the highest device utilization FPGA and ASIC Technology 2007 Xilinx, Inc. All Rights Reserved 2009 Comparison - 12 What Designs Are Okay? Designs with plenty of flip-flops to spare Designs with low flip-flop-to-LUT ratios • These are generally slow or lightly pipelined designs or ASIC prototypes Designs with lots of room in a particular device Designs with a small number of control sets are preferable The key is to evaluate slices and CLBs that have wasted registers Try to build designs with common control signals (plan) Designs with datapaths divisible by four are not affected even if they have a high number of control sets Such as byte-wide enables or data control registers, for example FPGA and ASIC Technology Comparison - 13 © 2009 2007 Xilinx, Inc. All Rights Reserved Active-Low Control Signals Problem: Active-low control signals can produce sub-optimal results Why? Control ports on Virtex-5 FPGA registers are active-high Hierarchical design methods This results in… Poorer utilization • • • More LUTs Less dense slice packing More routing resources necessary Longer run times • • Prohibits hierarchical design flows More difficult timing Worse timing and power Tip: Use active-high signals for CEs, sets, and resets FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 14 Use Active-High Control Signals The inverters cannot be combined into the same slice Flip-Flop This consumes more power and makes timing difficult Hierarchical design methods can proliferate LUT usage on active-low control signals FPGA and ASIC Technology Comparison - 15 © 2009 2007 Xilinx, Inc. All Rights Reserved Why Synchronous Resets? Each DSP48E has ~250 registers; none have asynchronous reset The DSP slice is more versatile than most realize The XC5V50 device has ~12,000 DSP slice registers The XC5V330 device has ~48,000 DSP slice registers Can be used for multipliers, add/sub, MACC, counters (with programmable terminal count), comparators, shifters, multiplexer, pattern match, and many other logic functions Many design that run out of slices are not fully utilizing the DSP48E Synthesis tools will infer the DSP48E for multipliers, but they are not smart enough to infer other functions • Can control synthesis use with attributes, but NOT if an asynchronous reset is used Tip: Use sync reset when using the DSP slice resources FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 16 Why Synchronous Resets? Block RAMs obtain minimum clock-to-output time by using the output registers Output registers only have synchronous resets Unused block RAMs can be used for many alternative purposes ROMs, large LUTs, complex logic, state machines, deep-shift registers, etc. Using unused block RAMs for other purposes can free up hundreds of flip-flops Using the block RAM in dual-port mode allows for greater utilization of this resource Many designs that run out of slices are not fully utilizing the block RAM resources Synthesis tools are not yet smart enough to infer less obvious functions Tip: Use sync reset when using the block RAM resources FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 17 Why Synchronous Resets? Synthesis could choose to move low-fanout synchronous resets from a control signal to the datapath to free up more registers Synthesis tools can do this, but it may depend on synthesis settings (may not be on by default) The Xilinx implementation tools cannot change what is synthesized This could allow packing of this register into a slice previously not possible Can improve timing as well as register density S D Low Fanout FPGA and ASIC Technology Comparison - 18 © 2009 2007 Xilinx, Inc. All Rights Reserved Why Synchronous Resets? Synchronous resets are automatically timed Do not need any special timing constraints Do not need special switches or setting to analyze timing Synchronous resets are inherently more predictable Less susceptible to accidentally missing timing, runt pulses, or other phenomenon from upsetting logical functionality Less prone to a race condition • Release of an asynchronous signal may not always have predictable results Tip: Synchronous resets enable your design to need FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 19 minimal testing Caveats to Synchronous Resets Synchronous resets may make timing more difficult, the design larger, and result in longer run times Why? The implementation tools automatically time synchronous reset paths This can result in • • • More timing paths to analyze and meet timing – On average ~five percent increase in the number of timing paths More replication of design resources With some synthesis tools this will use fewer SRLs, block RAM, DSP48s, and other dedicated hardware FPGA and ASIC Technology Comparison - 20 © 2009 2007 Xilinx, Inc. All Rights Reserved Changing to Synchronous Resets All new code should use synchronous resets when a reset is necessary For existing code, you have three choices Leave alone • Acknowledge the possible drawbacks of asynchronous resets Use synthesis switch Synplify: syn_clean_reset XST: -async_to_sync YES • Not the same as changing to synchronous reset but can help Manually (or use a script) to change the asynchronous reset to synchronous Removing the top-level reset port does not get the same result • Remove the reset from your code FPGA and ASIC Technology Comparison - 21 © 2009 2007 Xilinx, Inc. All Rights Reserved No Resets is Best Resets FPGA and ASIC Technology Comparison - 22 © 2009 2007 Xilinx, Inc. All Rights Reserved Why No Resets at All? Using synchronous logic frees up additional logic Designs in which the resets were removed resulted in an average of 3 ½ percent fewer registers Synthesis can realize this additional logic automatically Tip: This makes it easier for the mapper to group this register with FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 23 registers of a different control set Why No Resets at All? Synthesis can infer SRL-based shift registers But only if no resets are used (otherwise flip-flops are wasted) Or, the synthesis tool can emulate the reset (not what you want) The SRL is also useful for synchronous FIFOs, non-binary counters, terminal count logic, pattern generators, and reconfigurable LUTs Tip: NO reset saves a lot of flip-flops FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 24 Why No Resets at All? Routing can be considered one of the most valuable resources Resets compete for the same resources as the rest of the active signals of the design Including timing-critical paths More available routing gives the tools a better chance to meet your timing objectives Tip: NO reset saves routing and improves design speed FPGA and ASIC Technology Comparison - 25 © 2009 2007 Xilinx, Inc. All Rights Reserved Why No Resets at All? Even more block RAM inference Why? Virtex-5 FPGA RAMs • RAM enable has precedence over reset Virtex-5 FPGA registers • Reset has priority over the clock enable Coding for this functionality makes no sense With no reset, the enable precedence has no consequence Tip: NO reset gets more block© 2009 RAMs FPGA and ASIC Technology 2007 Xilinx, Inc. All Rights Reserved Comparison - 26 Why No Resets at All? Designs without resets have fewer timing paths By an average of 18 percent fewer timing paths Results in less run time Improved performance Less memory necessary during PAR Tip: NO reset builds a faster design and saves run time FPGA and ASIC Technology © 2009 2007 Xilinx, Inc. All Rights Reserved Comparison - 27 How Do I Get By? Some designs can get away without any resets but many designs need some resets Very few designs require resets on all registers • • Most ASICs require a described reset on every register. Implement this with the built-in Global Set/Reset (GSR) Suggestion Be selective when you code resets (FSMs, I/O, and flushing data) • Only place resets that have impact on functionality FPGA and ASIC Technology Comparison - 28 © 2009 2007 Xilinx, Inc. All Rights Reserved How Do I Get By? Initialize all registers in VHDL / Verilog code This should be done whether using a reset or not VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := (others <= ‘0’); Verilog: reg [7:0] my_register = 8’h00; Perform RTL simulation of the design If it functions during simulation, it should function on the FPGA FPGA and ASIC Technology Comparison - 29 © 2009 2007 Xilinx, Inc. All Rights Reserved Summary If your design barely fits, Xilinx recommends reducing the size of your design before trying to gain timing closure Most of these tips reduce design size Try to minimize the number of control sets your design uses Asynchronous resets can inhibit optimization of general logic (can force additional LUT inputs to be used) Synchronous resets allow synthesis tools to convert a control signal reset to the datapath Avoid the use of global resets Initialize all registers from your HDL If you have to, use the Startup_Virtex5 primitive to access the GSR net FPGA and ASIC Technology Comparison - 30 © 2009 2007 Xilinx, Inc. All Rights Reserved Summary Xilinx recommends NOT using the synthesis option to convert asynchronous resets to synchronous Avoid resets on SRLs (no reset functionality) Avoid asynchronous resets on block RAMs (the block RAM’s output register only supports a synchronous reset) Avoid asynchronous resets on DSP slice resources (their flip-flops only support a synchronous reset) Be aware of the difference between coding for a block RAM’s control signal precedence and a flip-flop’s precedence Use active-high control signals If you can design out your global reset, you will save a lot of routing and build a faster design FPGA and ASIC Technology Comparison - 31 © 2009 2007 Xilinx, Inc. All Rights Reserved Where Can I Learn More? Xilinx online documents www.support.xilinx.com • • • To search for an Application Note or White Paper, click the Documentation tab and enter the document number (WP231 or XAPP215) in the search window White papers for reference – WP231 – HDL Coding Practices to Accelerate Design Performance – WP248 – Retargeting Guidelines for Virtex-5 FPGAs – WP275 – Get your Priorities Right – Make your Design Up to 50% Smaller User guides for reference – UG193 - Virtex-5 FPGA XtremeDSP Design Considerations FPGA and ASIC Technology Comparison - 32 © 2009 2007 Xilinx, Inc. All Rights Reserved Trademark Information Xilinx is disclosing this Document and Intellectual Propery (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. FPGA and ASIC Technology Comparison - 33 © 2009 2007 Xilinx, Inc. All Rights Reserved