Document

advertisement
UltraFastTM Design
Methodology
Vivado
Design Suite
Guidelines For Predictable Success
© Copyright 2013 Xilinx
.
Xilinx Delivers an ASIC-Class Advantage
Through Silicon, Tools, and Methodology
Page 2
© Copyright 2013 Xilinx
.
Agenda
UltraFast Methodology Introduction
Write HDL code that best fit the hardware
Timing constraints creation and validation
Clock planning, Pin planning, Floorplanning
Page 3
© Copyright 2013 Xilinx
.
UltraFastTM Methodology
Benefits
Fast Compile Times and Predictable Results
– Require good methodology
Project Schedules Drive Time To Market
– Manage risk affectively
– Minimize Iterations, especially late-stage changes
– Explore options early with estimation and progressive analysis
Proven Recommendations from Successful Customers
– Best Practices with Checklists and Links to Documentation
– Verification Tools and Reports
– Linting and DRC
Page 4
© Copyright 2013 Xilinx
.
UltraFast User Guide: UG949
PCB planning: Avoid board re-spins
– Use XPE to validate power against budget
– Use Vivado I/O planning & DRC on a top level including all I/F
Design Creation: Coding style for best QoR
– Use HDL language templates in Vivado
– New Linting capability: Methodology DRC ruledeck
Implementation: Rapid convergence & signoff timing
– Rapid convergence technique: Closure with the simplest constraints
– Signoff convergence: Closure with pristine constraints
– Use XDC language templates & Timing DRC ruledeck
Page 5
© Copyright 2013 Xilinx
.
Overall Strategy for Accelerated Design Cycle
Earlier Iterations
Start closure at the front-end of the design flow
– Engage UltraFast early
– Faster iterations than in the back-end
– Greater impact on Quality of Results (QoR)
Impact on QoR
100x
10x
1.2x
1.1x
PCB /
Planning
Device/IP
selection
IP Integration, RTL Design,
Verification
Implementation
Closure
Reduce Design Cycle Time & Cost
Page 6
© Copyright 2013 Xilinx
.
Config., Bring-up,
Debug
UltraFAST Design Methodology Guide – UG949
Project Planning & Kickoff
Board Planning & Schematic Creation
Design Creation & IP Integration
Implementation & Design Closure
Configuration Programming &
Hardware Debug
Page 7
© Copyright 2013 Xilinx
.
Design Methodology Checklist in DocNav
Sample section
© Copyright 2013 Xilinx
.
Checklist
Spreadsheet based checklist to be used by designer and FAE to
review key portions of board schematic for FPGA/SOC
– Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces
© Copyright 2013 Xilinx
.
UltraFast
Design
Methodology
VivadoTM
Design
Suite
Guidelines For Predictable Success
© Copyright 2013 Xilinx
.
Vivado Enables Design Methodology
Key Technology: Shared, Scalable Data Model
Progressive estimation accuracy across the entire flow
Reduce iterations
late in the cycle
Estimation
IP
Integration
RTL Design
Synthesis
Place & Route
Shared, Scalable Data Model
Shares design information between
implementation steps
– Ensures fast convergence and timing closure
Enables use of the same commands &
reports to analyze design at every step
Schematics
Code
Changes
Tool
Settings
Placement
Edits
Timing Path #1
Timing Path #2
Timing Path #3
Highly efficient memory utilization
– Scalable for next decade of designs
© Copyright 2013 Xilinx
.
Placement
entity FIR is
port (clk : in
rst : in
din : in
Timing
Report
Enables cross-probing
Page 11
RTL
Reports
Technique for Rapid Timing Closure
Baselining
Prioritize and close 1 step at a time
– Converge first at Synthesis (faster, higher impact), then in back-end
– Start with the simplest (baseline) constraint:
• Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times
• Define proper clock dependencies
– Make sure the design & constraints are reasonable
– Analyze, get to root cause, then decide how to fix it
• Clock path vs. data path vs. interconnect delay vs. logic delay…
– Add I/O constraints (with Vivado XDC templates) and redo…
Do not confuse with “Signoff” Constraints
– You still want complete constraints
View QuickTime Video for UltraFast Design Methodology for
Timing Closure
Page 12
© Copyright 2013 Xilinx
.
Progressive Approach to Design Closure
Synthesis
Synthesis
Synthesis
• Analysis
• Analysis
• Analysis
Baseline Constraints
If needed
Add I/O Constraints
Add Timing Exceptions
and/or Floorplan
Route
Place
Route
Place
Route
Place
• Analysis
• Analysis
• Analysis
• Analysis
• Analysis
• Analysis
Optimize Internal Paths
Optimize Entire Chip
Fine-tune
Fmax
Fmax
Fmax
Baseline XDC
Page 13
Complete XDC
© Copyright 2013 Xilinx
.
Final XDC
Critical Path could be a Moving Target
Example from a Real Design
Post-synthesis estimates (the real problem)
– Worst path: 13 levels of logic
worst path: 4.3ns
Post-place
– Worst path: 7 levels
– Paths with 7-13 levels got placed locally
worst path: 4.2ns
Post-route (the side-effect of the real problem)
– Worst Path: 4 levels of logic
– Paths with 5-13 levels got preferred routing
worst path: 4.1ns
Analyze & Fix timing issues at early stages for
faster timing convergence
Page 14
© Copyright 2013 Xilinx
.
Writing
HDL
CodeSuite
that Best
Vivado
Design
Fits the Hardware
© Copyright 2013 Xilinx
.
Impact of HDL Coding Style
Block inference
– Follow recommended templates for RAM, DSP, LUTRAM, SRL inference
Pipeline your design to reduce levels of logic
Think about Reset
– Taxes routing not always needed: Xilinx devices boot in a known state
– Dedicated shifters (SRLs) and RAM memory arrays don’t use resets
Synchronous resets are preferred
– Allows packing of registers into dedicated RAM and DSP blocks
– Tools have the option to implement reset in datapath (LUT)
Give more freedom to Synthesis
– Revisit attributes needed by other synthesis engines or older releases
– Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes…
Review Design Creation Chapter in UG949
Review Design Creation tab in the Design Methodology Checklist
Page 16
© Copyright 2013 Xilinx
.
Using HDL Language Templates
Accessing templates in IDE
– Windows  Language Templates
Synthesis Templates
– BRAM, LUTRAM, ROM, SRL
– Counter, MULT
– FSM, Decoder, Encoder
– …
Page 17
© Copyright 2013 Xilinx
.
Coding to Match the Hardware
DSP48 Blocks and BRAM Blocks
Leverage DSP block cascading capabilities
in
Pipelined adder chain
delivers optimal performance
Adder tree
becomes a
performance
bottleneck
in
out
DSP48 DSP48 DSP48
DSP48
out
Avoid Block RAM collision avoidance logic(*)
Synthesis
assumes
collision
rdaddr
wraddr
din
(*): logic added by default by Synplify
(attribute syn_no_rw_check removes the logic)
Page 18
RAMB
rdaddr
wraddr
din
dout
=
© Copyright 2013 Xilinx
.
RAMB
Inference with
collision check
disabled
dout
The Impact of Resets
Increase performance with the right reset choice
– Think Local, not Global with resets
– No reset at all (if possible) is best
– Synchronous rather than asynchronous reset
– Active HIGH rather than active low reset
– Default register value can be controlled via the
INIT property or at signal declaration in RTL
From: UG949 Chapter 4 Design Creation – Control Signals and Control Sets
Page 19
© Copyright 2013 Xilinx
.
Reset Routing
Resets compete for the same resources as the
rest of the active signals of the design
– Including the critical datapath paths
Designs that minimize or eliminate resets have
–
–
–
–
–
About 18% fewer timing paths on average
About 15% less runtime on average
10% fewer registers and 7% fewer LUTs
20% lower timing scores
Use less memory
Be selective with where you code resets
Initialize all registers in the VHDL / Verilog code
Page 20
© Copyright 2013 Xilinx
.
More on Resets
Many designs need some resets
– Very few designs require resets on all registers
• Most ASICs require a described reset on every register for testability
• But the FPGA has a built-in Global Set/Reset (GSR)
Guideline: Be selective with where you code resets
– Only place resets that have impact on functionality
• I/O, State-machines, critical control logic, etc.
– Omit resets that do not
Initialize all registers in the VHDL / Verilog code
– This should be done whether using a reset or not
VHDL:
signal my_regsiter : std_logic_vector (7 downto 0) := “01010101”;
Verilog:
reg [7:0] my_register = 8’h55;
© Copyright 2013 Xilinx
.
Gauging Other Design Metrics
report_high_fanout_nets
– To reduce fanout on a net use…
• max_fanout (Vivado synthesis and XST)
• syn_maxfan (Synplify)
– Use phys_opt_design for timing driven replication
From: Design Methodology Checklist – Design Creation tab
Page 22
© Copyright 2013 Xilinx
.
Gauging Other Design Metrics
report_control_sets
– Indicator of possible packing fragmentation and fitting issues
– Run the –verbose option to generate a full list
– Use Synplify’s syn_reduce_controlset_size attribute for control
Default is 2, set it to 8 to eliminate most lowest fanout control sets
From: Design Methodology Checklist – Design Creation tab
Page 23
© Copyright 2013 Xilinx
.
Methodology DRCs
Two new rule decks in 2013.3
– methodology_checks
– timing_checks
Usage:
– report_drc –ruledeck methodology_checks
– report_drc –ruledeck timing_checks
– Specific “methodology_checks” available only for the elaborated
design
Tools →Report → Report DRCs
Page 24
© Copyright 2013 Xilinx
.
Review and Resolve Critical Warnings
Vivado does not stop for Critical Warnings
– Enables fixing many issues at once
– Bitstream generation will error with unresolved critical warnings
From: UG949 Chapter 5 Implementation – Moving past Synthesis
Page 25
© Copyright 2013 Xilinx
.
Review and Resolve Critical Warnings
Critical warnings are serious design issues
– Invalid constraints or XDC syntax errors
– Path segmentation
– Netlist or target objects not found or invalid
Address these warnings before moving forward
– Results of design analysis may be inaccurate
– Critical Warnings may prevent design success
Page 26
© Copyright 2013 Xilinx
.
Timing
Constraints
Creation
Vivado
Design Suite
and Validation
© Copyright 2013 Xilinx
.
Timing Constraints Need to Be "Clean"
When constraints (clock, IO) are missing
– The corresponding paths are timed optimistically
– No violation will be reported but design may not work on HW
When path are incorrectly constrained
– Runtime and optimization efforts will be spent on the wrong paths
– Reported timing violations may not result in any issues on HW
When constraints create wrong HOLD violations
– May result in long runtime and SETUP violations
– P&R fixes HOLD violations as #1 priority, because:
• Designs with HOLD violations won’t work on HW
• Designs with SETUP violations will work, but slower
Review the Creating Constraints section of the Design
Creation Chapter in UG949 & checklist
Page 28
© Copyright 2013 Xilinx
.
Include IP Constraints
Many cores have their own constraints / exceptions
– PCIE, MIG, RAM-based asynchronous FIFOs…
Non-native IP: Be careful!
– Very easy to drop the IP constraints especially if provided as .ngc files
Native IP: Constraints included
– Sources window in IDE: Compile Order  Constraints
– Use report_compile_order –constraints to identify constraint file sources
Page 29
© Copyright 2013 Xilinx
.
Method to Create Good Constraints
Create clocks and define clock interactions
– Four-step guideline
Set input and output delays
– Beware of creating incorrect HOLD violations
Set timing exceptions
– Less is more!
– Beware of creating incorrect HOLD violations
Use report commands to validate each step
Page 30
© Copyright 2013 Xilinx
.
Clock Ground Rules
For SDC-based timers, clocks only exist if you create them
– Use create_clock for primary clocks
Clocks propagate automatically through clocking modules
– MMCM and PLL output clocks are automatically generated
– Gigabit transceivers are not supported. Create them manually.
don’t
create_clock here
create_clock
here
Use create_generated_clock for internal clocks (if needed)
All inter-clock paths are evaluated by default
Page 31
© Copyright 2013 Xilinx
.
Four Steps for Creating Clocks
Run report_timing_summary before starting constraint capture
– View report_clocks section to see all signals driving clock pins
Step 1
– Use create_clock for all primary clocks on top level ports
– Run the design (synthesis) or open netlist design
Step 2
– Run report_clocks
– Study the report to verify period, phase and propagation
– Apply corrections to your constraints (if needed)
Attributes
P: Propagated
G: Generated
Clock
sys_clk
pll0/clkfbout
pll0/clkout0
pll0/clkout1
Page 32
Period
10.000
10.000
2.500
10.000
Waveform
{0.000 5.000}
{0.000 5.000}
{0.000 1.250}
{0.000 5.000}
Attributes
P
P,G
P,G
P,G
Sources
{sys_clk}
{pll0/plle2_adv_inst/CLKFBOUT}
{pll0/plle2_adv_inst/CLKOUT0}
{pll0/plle2_adv_inst/CLKOUT1}
Output of report_clocks (excerpt)
© Copyright 2013 Xilinx
.
Four Steps for Creating Clocks (continued)
Step 3
– Evaluate the clock interaction using report_clock_interaction
BEWARE: All inter-clock paths are constrained by default!
– Mark inter-clock paths (Clock Domain Crossing) as asynchronous
• Make sure you designed proper CDC synchronizers
• Use set_clock_groups (preferred method to set_false_path)
BEWARE: This overrides any set_max_delay constraints!
– Do you have unconstrained objects?
• Find out with check_timing
Step 4
– Run report_clock_networks
– You want the design to have clean clock lines without logic
• Tip: Use clock gating option in synthesis to remove LUTs on the clock line
Page 33
© Copyright 2013 Xilinx
.
Defining & Validating Clock Interactions
Page 34
© Copyright 2013 Xilinx
.
Constraining Cross Clock Domains
Use appropriate synchronizing
techniques
– 2 or more register stages, for single bit
– FIFO for buses
Maximize MTBF
– ASYNC_REG to place synchronizing flops in
the same slice for best Mean Time Between
Failures (MTBF)
set_property ASYNC_REG TRUE \
[get_cells [list sync0_reg sync1_reg]]
Page 35
© Copyright 2013 Xilinx
.
Constraints for Asynchronous CDC
Ignoring timing paths between individual clocks
set_clock_groups –asynchronous –group {clk1} –group {clk2}
This is equivalent to:
set_false_path –from [get_clocks clk1] –to [get_clocks clk2]
set_false_path –from [get_clocks clk2] –to [get_clocks clk1]
BEWARE: This overrides any set_max_delay constraints!
Ignoring timing paths between groups of clocks
# SDC create_clock for the two primary clocks
create_clock -name clk_oxo -period 10 [get_ports clk_oxo]
create_clock -name clk_core -period 10 [get_ports clk_core]
# Set Asynchronous Clock Groups
set_clock_groups -asynchronous
-group [get_clocks –include_generated_clocks clk_oxo] \
-group [get_clocks –include_generated_clocks clk_core} ]
BEWARE: This overrides any set_max_delay constraints!
Page 36
© Copyright 2013 Xilinx
.
Setting Input / Output Delays
Start with no IO constraints
– Focus on finding and fixing core timing issues
– Vivado does not time from IOs without IO constraints
• No Need to false_path –from or –to get_ports to ignore IO timing
Specify realistic IO delays Once Core Timing Reasonable
– Use set_input_delay and set_output_delay
– Wrong delay value (e.g. <0 ns) can cause invalid analysis
The delay value specified is the external delay
– Default in UCF: internal delay
Page 37
© Copyright 2013 Xilinx
.
Multicycle Paths
set_multicycle_path N implies a HOLD check at N-1
– E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles!
Whenever setup check is changed, hold check is also changed
Guidelines for proper multicycle path constraints
– Should always be pairs of set_multicycle_path constraints
• One for –setup and one for –hold
– Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations
regA
D
CE
Q
Multicycle Path = 3T
regB
D
CE
regA/CLK
Q
HOLD
SETUP
regB/CLK
CLK
REGB/D
set_multicycle_path –from [get_cells regA] –to [get_cells regB]
3 -setup
set_multicycle_path –from [get_cells regA] –to [get_cells regB]
2 –hold
hold checked at edge 3-1-2 = 0
Page 38
© Copyright 2013 Xilinx
.
Using Vivado Language Templates
XDC Template
Accessing templates in IDE
– Windows  Language Templates
SDR & DDR Templates
– Inputs and outputs
– Source / System synchronous
– Center / Edge aligned
Page 39
© Copyright 2013 Xilinx
.
Reading the Reports
Reading the report_timing_summary
– Intra-clock report
– Inter-clock report
Use report_timing for interactivity and advanced options
– You would typically use it in the TCL window
• report_timing –through [get_nets {/cpu_top/crit_net_name}]
• report_timing –setup –max_paths 10 # For 10 worst setup paths
• report_timing –hold –to [get_cells {/top/item}] # Hold on “item”
– Use filters from your XDC files to check each expression
• set_multicycle_path –from [get_pins regA/C] –to [get_pins regB/D]
• report_timing –from [get_pins regA/C] –to [get_pins regB/D]
Page 40
© Copyright 2013 Xilinx
.
Timing Command Summary
Obtain full timing summary of the design
– report_timing_summary: summary subsections for all timing checks
Create and validate clocks
– check_timing: for missing clocks and IO constraints
– report_clocks: check frequency and phase
– report_clock_networks: possible clock root
Validate clock groups
– report_clock_interaction
Validate I/O delays
– report_timing –from [input_port] –setup/-hold
– report_timing –to [output_port] –setup/-hold
Add exceptions if necessary
– Validate using report_timing
Page 41
© Copyright 2013 Xilinx
.
Managing Constraint Files
Using a single XDC file
– XDC apply to both synthesis & implementation
Using multiple XDC files
– Main XDC with top level constraints
• Primary clocks and I/O delays
• Exceptions on clocks and RTL objects
– Implementation specific XDC
• Physical constraints
main.xdc
Elaboration
• Exceptions based on physical netlist
Synthesis
The order of constraint files matters!
– To report the order of XDC files:
report_compile_order –constraints
Page 42
© Copyright 2013 Xilinx
.
impl.xdc
Implementation
Managing IP Constraint Files
Some IP come with their own XDC constraints
– Example: The clocking wizard
The clocking wizard XDC will be read before the user XDC by default
(user constraints can override IP defined clocks by default)
The order of constraint files matters!
– To report the order of XDC files: report_compile_order –constraints
– Always verify the clocks using report_clocks (step 2 of 4-step process)
– To change the default processing order
set_property set_processing_order early|late IP_XDC_File
– If necessary, IP_XDC_files can be enabled/disabled
Page 43
© Copyright 2013 Xilinx
.
Clock
Planning,
Planning
Vivado
DesignPin
Suite
and Floorplanning
© Copyright 2013 Xilinx
.
Clock and Pin Planning
Pin and Clock Planning often happens early in the Project
– Decisions here can have prolific effects throughout the design
• Excessive clock skew
• Poor I/O timing
• Timing hazardous clock domain crossing
• Less flexible logic placement
• Fewer clocking resource choices
• Excessive routing delays
• Reduced device utilization
Pin and Clock Planning should be considered together
– Choices made for clock pins affect clocking timing and resources choices
– Choices made for data pins affect clock pin placement decisions
Review the Board & Device Planning Chapter in UG949
Review the Board and FPGA Planning tab in the Design
Methodology Checklist
Page 45
© Copyright 2013 Xilinx
.
Clock and Pin Planning
Considerations for clock pin planning
– Generate all I/O interface and clocking IP prior to pin assignment
– Consolidate clocking where possible and consolidate MMCMs
• Fewer clocks and MMCM means fewer clock resources and crossings
– Consider all CDC when assigning clocking resource and pins
Considerations for data pin planning
– Group related data pins in same bank, or adjacent banks if single bank not possible
• Place associated I/O clock in same bank when possible
– Consider associated control signal placement along with data paths
– Consider data flow as planning pinout
• Chose a pinout that has clean passage through device
– Place high fanout signals towards the middle of the chip
• Really high fanout signals considered for CCIO pins with BUFG resources
– Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement
Page 46
© Copyright 2013 Xilinx
.
Clock and Pin Planning
Use Vivado Pin Planning capabilities
– Import pin & clocking assignments from generated IP
– Visualization of I/O resource placement on package and in device
– DRC, SSN and other checks available to validate choices
– Configuration pin assignments & possible device migration considerations
Re-evaluate in Vivado any
subsequent pin changes
– Understand how PCB pin swaps
affect timing & resources
Vivado I/O & Clock Planning
Tutorial UG935
– Available in DocNav and Vivado
Page 47
© Copyright 2013 Xilinx
.
Additional Considerations for SSI Devices
Clocking
– High fanout clocks should be placed in center SLRs
– Place regional clocks on center clock region within an SLR
– Place clock pin / MMCMs in same SLR as timing critical I/O interfaces
(avoid driving timing critical I/O interfaces from a different SLR)
– Clock pin choices should be balanced across upper & lower SLR:
• 2 upper SLR clock domains have 8 BUFG x 2
• 4 lower SLR clock domains have 4 BUFG x 4
Pinout
– High fanout signals feeding all SLRs placed in center SLRs
– I/O interfaces should not span across SLRs
– Pay attention to data flow across SLRs
• Avoid the need for multiple SLR crossings due to pinout decisions
For more details
Consult UG872: Large FPGA Methodology Guide for more details
Page 48
© Copyright 2013 Xilinx
.
Improving Placement Through Floorplanning
First improve HDL, synthesis & constraints
– Easier, more repeatable to not floorplan when avoidable
Start design without any floorplanning
– See what P&R algorithms can do without restrictions
Using Vivado IDE
– Highlight placement per module as guideline
– Visualize placement of critical timing paths
• Understand data flow in & out of Pblocks
• Understand affects of Pblock inside & out
• Resources around placement can affect data flow
– Create Pblocks minding resource utilization
Careful not to over floorplan – Less is best
– Only floorplan the critical areas of the design
– Do not create Pblocks with very high utilization
• Can create routing congestion or new timing problems
– Avoid overlapping Pblocks
• Creates more complex placement and clock scenarios
Page 49
© Copyright 2013 Xilinx
.
Baseline run with highlighted regions
Summary
Vivado Design Suite
© Copyright 2013 Xilinx
.
UltraFastTM Methodology Review
For optimal results, adapt your HDL style to the FPGA
– Be mindful of BRAM, LUTRAM, DSP, SRL inference needs
– Avoid asynchronous reset and wired resets in general
– Minimize control signals
– For large FPGAs, design with the dataflow and floorplanning in mind
Baseline your constraints to converge rapidly
Provide clean timing constraints
– Bad constraints results in bad runtime, performance and HW failures
– Learn the essentials of timing creation & validation methods
Follow pin/clock planning guidelines
– Must follow dataflow
– Place large fanout clocks and pins in the center of SSIT devices
Page 51
© Copyright 2013 Xilinx
.
Follow Xilinx
facebook.com/XilinxInc
twitter.com/XilinxInc
© Copyright 2013 Xilinx
.
youtube.com/XilinxInc
Thank
YouDesign Suite
Vivado
© Copyright 2013 Xilinx
.
Download