Virtex-5 FPGA Coding Techniques
Part 1
FPGA and ASIC Technology
Comparison - 1
© 2009 Xilinx, Inc. All Rights Reserved
Intro to VHDL or
Intro to Verilog
3
days
FPGA and ASIC Technology
Comparison
Curriculum
Path
FPGA vs. ASIC Design
Flow
ASIC to FPGA
Coding
Conversion
Virtex-5 FPGA
Coding Techniques
Spartan-3 FPGA
Coding Techniques
Fundamentals of
FPGA Design
Designing for
Performance
for
1
day
ASIC Design
2
days
Advanced FPGA
Implementation
2
days
Welcome
This REL will help you build efficient
Virtex®-5 FPGA designs that have an
efficient size and run at high speed
We will show you how to avoid some of the
most common design mistakes
This content is essential if you have never
coded a design for the Virtex-5 FPGA or are
converting an ASIC design
After completing this module, you
will able to:
Optimize ASIC code for implementation in a
Virtex-5 FPGA
Build a checklist of tips for optimizing your
code for the Virtex-5 FPGA
Introduction
There is no single “perfect” way to create a design
Different synthesis options and implementation
options will lead to different results
•
One method will NOT work best in all cases
The coding techniques described here are strongly recommended because
they have the biggest impact on device utilization and speed
There are however guidelines that usually lead
to improved results
FPGA and ASIC Technology
Comparison - 5
© 2009
2007 Xilinx, Inc. All Rights Reserved
Tactics to Meet Timing
As always, use as many of the dedicated resources as possible
(SRLs, DSP48s, and block RAMs)
Different tactics must be used when your device is full
Timing does not matter if your design does not fit in the device
The tactics that will be discussed generally work best in designs that are
not full
One of the most effective ways to reduce power in FPGAs is to
reduce the number of resources
One of the side benefits of these techniques is that they will allow you to
improve performance and reduce power
FPGA and ASIC Technology
Comparison - 6
© 2009
2007 Xilinx, Inc. All Rights Reserved
Limiting Virtex-5 FPGA Resources
Build a design that uses fewer “limiting” resources
Fewer registers
•
•
Many designs run out of registers before other components (especially if the
design is heavily pipelined)
Registers are most often the limiting resource in Virtex-5 designs
Fewer LUTs
•
The LUT6 is 40 percent more efficient than a LUT4
–
But this does not yield performance benefits for every design
FPGA and ASIC Technology
Comparison - 7
© 2009
2007 Xilinx, Inc. All Rights Reserved
Virtex-5 FPGA Registers
Why are registers sometimes a scarce resource?
The Virtex-5 FPGA has ~30 percent fewer registers for a given logic or
array size compared to the Virtex-4 FPGA
•
•
The 4VLX80 device has 71,680 slice flip-flops versus 51,840 for a 5VLX85
device
So you should NOT need to pipeline as much!
Be aware of control signal limitations (this will be covered later)
Lack of use (inference) of SRLs, block RAMs, and DSP slice resources
Replication of registers (logic replication)
•
Careful use of synthesis options that may increase your design size is
important
FPGA and ASIC Technology
Comparison - 8
© 2009
2007 Xilinx, Inc. All Rights Reserved
Introduction to Control Sets
A control signal is
Clock Enable / Gate Enable
Write Enable
Set / Preset
Reset / Clear
Clock / Gate
Slice: WE, CE, SR, REV, CLK
A control set is
A group of enable, set, reset,
and clock
•
This includes Vcc / Gnd when
they are not used
Unique control sets are
The number of groups of unique control signals that your design has
Tip:
The implementation tools© 2009
cannot group flip-flops into the same slice if
FPGA and ASIC Technology
2007 Xilinx, Inc. All Rights Reserved
Comparison - 9
they
do not share the same control signals
What Creates Control Signals?
Control signals are the signals that are connected to the actual control
ports on the register
Inference code
Clocks and asynchronous set/resets always become control signals
•
They cannot be moved to the datapath
Clock enables and synchronous set/resets sometimes become control
signals (this is decided by the synthesis tool)
•
These control signals can be moved to the datapath
How will a global asynchronous reset and a local reset inferred on a single
register be implemented?
•
•
Asynchronous reset gets the port on the register
Synchronous reset gets a LUT input
Tip:
Clock enables and synchronous
sets and resets can be moved to the
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 10
datapath
What Creates Control Signals?
Instantiation of primitives and cores
Gate-level connection of UNISIM and core primitives dictates control signal
usage
Synthesis optimization
Synthesis may choose to build a control signal for logic optimization
Physical synthesis
Can change control sets from original specifications
Global or logic optimization may choose to build a control signal for logic
optimization
Tip:
The instantiations of cores
you make should share the same control
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 11
signals you infer to minimize the number of control sets
Why Be Concerned?
Four registers per slice; all share the
same control signals
If the number of registers in the control
set do not divide cleanly by four, some
registers must go unused
This is of concern for designs that
have several very low fanout control
signals
A design with a large number of control
sets potentially can show lower utilization
of registers (but not always)
Tip:
Try to build in byte-wide ©widths
for the highest device utilization
FPGA and ASIC Technology
2007 Xilinx, Inc. All Rights Reserved
2009
Comparison - 12
What Designs Are Okay?
Designs with plenty of flip-flops to spare
Designs with low flip-flop-to-LUT ratios
•
These are generally slow or lightly pipelined designs or ASIC prototypes
Designs with lots of room in a particular device
Designs with a small number of control sets are preferable
The key is to evaluate slices and CLBs that have wasted registers
Try to build designs with common control signals (plan)
Designs with datapaths divisible by four are not affected even if they
have a high number of control sets
Such as byte-wide enables or data control registers, for example
FPGA and ASIC Technology
Comparison - 13
© 2009
2007 Xilinx, Inc. All Rights Reserved
Active-Low Control Signals
Problem: Active-low control signals can produce sub-optimal results
Why?
Control ports on Virtex-5 FPGA registers are active-high
Hierarchical design methods
This results in…
Poorer utilization
•
•
•
More LUTs
Less dense slice packing
More routing resources necessary
Longer run times
•
•
Prohibits hierarchical design flows
More difficult timing
Worse timing and power
Tip:
Use active-high signals for
CEs, sets, and resets
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 14
Use Active-High Control Signals
The inverters
cannot be
combined into
the same slice
Flip-Flop
This consumes
more power and
makes timing
difficult
Hierarchical design methods can proliferate LUT usage
on active-low control signals
FPGA and ASIC Technology
Comparison - 15
© 2009
2007 Xilinx, Inc. All Rights Reserved
Why Synchronous Resets?
Each DSP48E has ~250 registers; none have asynchronous reset
The DSP slice is more versatile than most realize
The XC5V50 device has ~12,000 DSP slice registers
The XC5V330 device has ~48,000 DSP slice registers
Can be used for multipliers, add/sub, MACC, counters (with programmable
terminal count), comparators, shifters, multiplexer, pattern match, and many
other logic functions
Many design that run out of slices are not fully utilizing the DSP48E
Synthesis tools will infer the DSP48E for multipliers, but they are not smart
enough to infer other functions
•
Can control synthesis use with attributes, but NOT if an asynchronous reset is
used
Tip:
Use sync reset when using
the DSP slice resources
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 16
Why Synchronous Resets?
Block RAMs obtain minimum clock-to-output time by using the
output registers
Output registers only have synchronous resets
Unused block RAMs can be used for many alternative purposes
ROMs, large LUTs, complex logic, state machines, deep-shift registers, etc.
Using unused block RAMs for other purposes can free up
hundreds of flip-flops
Using the block RAM in dual-port mode allows for greater utilization of this resource
Many designs that run out of slices are not fully utilizing the block
RAM resources
Synthesis tools are not yet smart enough to infer less obvious functions
Tip:
Use sync reset when using
the block RAM resources
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 17
Why Synchronous Resets?
Synthesis could choose to move low-fanout synchronous resets from a
control signal to the datapath to free up more registers
Synthesis tools can do this, but it may depend on synthesis settings (may
not be on by default)
The Xilinx implementation tools cannot change what is synthesized
This could allow packing of this register into a slice previously not
possible
Can improve timing as well as register density
S
D
Low
Fanout
FPGA and ASIC Technology
Comparison - 18
© 2009
2007 Xilinx, Inc. All Rights Reserved
Why Synchronous Resets?
Synchronous resets are automatically timed
Do not need any special timing constraints
Do not need special switches or setting to analyze timing
Synchronous resets are inherently more predictable
Less susceptible to accidentally missing timing, runt pulses, or other
phenomenon from upsetting logical functionality
Less prone to a race condition
•
Release of an asynchronous signal may not always have predictable results
Tip:
Synchronous resets enable
your design to need
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 19
minimal
testing
Caveats to Synchronous Resets
Synchronous resets may make timing more difficult, the design larger,
and result in longer run times
Why?
The implementation tools automatically time synchronous reset paths
This can result in
•
•
•
More timing paths to analyze and meet timing
–
On average ~five percent increase in the number of timing paths
More replication of design resources
With some synthesis tools this will use fewer SRLs, block RAM, DSP48s, and
other dedicated hardware
FPGA and ASIC Technology
Comparison - 20
© 2009
2007 Xilinx, Inc. All Rights Reserved
Changing to Synchronous Resets
All new code should use synchronous resets when a reset is necessary
For existing code, you have three choices
Leave alone
•
Acknowledge the possible drawbacks of asynchronous resets
Use synthesis switch
Synplify:
syn_clean_reset
XST:
-async_to_sync YES
•
Not the same as changing to synchronous reset but can help
Manually (or use a script) to change the asynchronous reset to
synchronous
Removing the top-level reset port does not get the same result
•
Remove the reset from your code
FPGA and ASIC Technology
Comparison - 21
© 2009
2007 Xilinx, Inc. All Rights Reserved
No Resets is Best
Resets
FPGA and ASIC Technology
Comparison - 22
© 2009
2007 Xilinx, Inc. All Rights Reserved
Why No Resets at All?
Using synchronous logic frees up additional logic
Designs in which the resets were removed resulted in an average of 3 ½
percent fewer registers
Synthesis can realize this additional logic automatically
Tip:
This makes it easier for the
mapper to group this register with
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 23
registers
of a different control set
Why No Resets at All?
Synthesis can infer SRL-based shift registers
But only if no resets are used (otherwise flip-flops are wasted)
Or, the synthesis tool can emulate the reset (not what you want)
The SRL is also useful for synchronous FIFOs, non-binary counters,
terminal count logic, pattern generators, and reconfigurable LUTs
Tip:
NO reset saves a lot of flip-flops
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 24
Why No Resets at All?
Routing can be considered one of the most valuable resources
Resets compete for the same resources as the rest of the active signals
of the design
Including timing-critical paths
More available routing gives the tools a better chance to meet your timing
objectives
Tip: NO reset
saves routing and
improves design
speed
FPGA and ASIC Technology
Comparison - 25
© 2009
2007 Xilinx, Inc. All Rights Reserved
Why No Resets at All?
Even more block RAM inference
Why?
Virtex-5 FPGA RAMs
•
RAM enable has precedence over reset
Virtex-5 FPGA registers
•
Reset has priority over the clock enable
Coding for this functionality makes no sense
With no reset, the enable precedence has no consequence
Tip:
NO reset gets more block© 2009
RAMs
FPGA and ASIC Technology
2007 Xilinx, Inc. All Rights Reserved
Comparison - 26
Why No Resets at All?
Designs without resets have fewer timing paths
By an average of 18 percent fewer timing paths
Results in less run time
Improved performance
Less memory necessary during PAR
Tip:
NO reset builds a faster design
and saves run time
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 27
How Do I Get By?
Some designs can get away without any resets but many designs need
some resets
Very few designs require resets on all registers
•
•
Most ASICs require a described reset on every register.
Implement this with the built-in Global Set/Reset (GSR)
Suggestion
Be selective when you code resets (FSMs, I/O, and flushing data)
•
Only place resets that have impact on functionality
FPGA and ASIC Technology
Comparison - 28
© 2009
2007 Xilinx, Inc. All Rights Reserved
How Do I Get By?
Initialize all registers in VHDL / Verilog code
This should be done whether using a reset or not
VHDL:
signal my_regsiter : std_logic_vector (7 downto 0) := (others <= ‘0’);
Verilog:
reg [7:0] my_register = 8’h00;
Perform RTL simulation of the design
If it functions during simulation, it should function on the FPGA
FPGA and ASIC Technology
Comparison - 29
© 2009
2007 Xilinx, Inc. All Rights Reserved
Summary
If your design barely fits, Xilinx recommends reducing the size of your
design before trying to gain timing closure
Most of these tips reduce design size
Try to minimize the number of control sets your design uses
Asynchronous resets can inhibit optimization of general logic (can force
additional LUT inputs to be used)
Synchronous resets allow synthesis tools to convert a control signal reset
to the datapath
Avoid the use of global resets
Initialize all registers from your HDL
If you have to, use the Startup_Virtex5 primitive to access the GSR net
FPGA and ASIC Technology
Comparison - 30
© 2009
2007 Xilinx, Inc. All Rights Reserved
Summary
Xilinx recommends NOT using the synthesis option to convert
asynchronous resets to synchronous
Avoid resets on SRLs (no reset functionality)
Avoid asynchronous resets on block RAMs (the block RAM’s output
register only supports a synchronous reset)
Avoid asynchronous resets on DSP slice resources (their flip-flops only
support a synchronous reset)
Be aware of the difference between coding for a block RAM’s control signal
precedence and a flip-flop’s precedence
Use active-high control signals
If you can design out your global reset, you will save a lot of routing and
build a faster design
FPGA and ASIC Technology
Comparison - 31
© 2009
2007 Xilinx, Inc. All Rights Reserved
Where Can I Learn More?
Xilinx online documents
www.support.xilinx.com
•
•
•
To search for an Application Note or White Paper, click the Documentation tab
and enter the document number (WP231 or XAPP215) in the search window
White papers for reference
–
WP231 – HDL Coding Practices to Accelerate Design Performance
–
WP248 – Retargeting Guidelines for Virtex-5 FPGAs
–
WP275 – Get your Priorities Right – Make your Design Up to 50%
Smaller
User guides for reference
–
UG193 - Virtex-5 FPGA XtremeDSP Design Considerations
FPGA and ASIC Technology
Comparison - 32
© 2009
2007 Xilinx, Inc. All Rights Reserved
Trademark Information
Xilinx is disclosing this Document and Intellectual Propery (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface
with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or
transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written
consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications
regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any
rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make
changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or
to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or
assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH
YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE,
WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED,
OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES
PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE
ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU
WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as
in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk
Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the
Design in such High-Risk Applications is fully at your risk.
© 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.
FPGA and ASIC Technology
Comparison - 33
© 2009
2007 Xilinx, Inc. All Rights Reserved