UltraFastTM Design Methodology Vivado Design Suite Guidelines For Predictable Success © Copyright 2013 Xilinx . Xilinx Delivers an ASIC-Class Advantage Through Silicon, Tools, and Methodology Page 2 © Copyright 2013 Xilinx . Agenda UltraFast Methodology Introduction Write HDL code that best fit the hardware Timing constraints creation and validation Clock planning, Pin planning, Floorplanning Page 3 © Copyright 2013 Xilinx . UltraFastTM Methodology Benefits Fast Compile Times and Predictable Results – Require good methodology Project Schedules Drive Time To Market – Manage risk affectively – Minimize Iterations, especially late-stage changes – Explore options early with estimation and progressive analysis Proven Recommendations from Successful Customers – Best Practices with Checklists and Links to Documentation – Verification Tools and Reports – Linting and DRC Page 4 © Copyright 2013 Xilinx . UltraFast User Guide: UG949 PCB planning: Avoid board re-spins – Use XPE to validate power against budget – Use Vivado I/O planning & DRC on a top level including all I/F Design Creation: Coding style for best QoR – Use HDL language templates in Vivado – New Linting capability: Methodology DRC ruledeck Implementation: Rapid convergence & signoff timing – Rapid convergence technique: Closure with the simplest constraints – Signoff convergence: Closure with pristine constraints – Use XDC language templates & Timing DRC ruledeck Page 5 © Copyright 2013 Xilinx . Overall Strategy for Accelerated Design Cycle Earlier Iterations Start closure at the front-end of the design flow – Engage UltraFast early – Faster iterations than in the back-end – Greater impact on Quality of Results (QoR) Impact on QoR 100x 10x 1.2x 1.1x PCB / Planning Device/IP selection IP Integration, RTL Design, Verification Implementation Closure Reduce Design Cycle Time & Cost Page 6 © Copyright 2013 Xilinx . Config., Bring-up, Debug UltraFAST Design Methodology Guide – UG949 Project Planning & Kickoff Board Planning & Schematic Creation Design Creation & IP Integration Implementation & Design Closure Configuration Programming & Hardware Debug Page 7 © Copyright 2013 Xilinx . Design Methodology Checklist in DocNav Sample section © Copyright 2013 Xilinx . Checklist Spreadsheet based checklist to be used by designer and FAE to review key portions of board schematic for FPGA/SOC – Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces © Copyright 2013 Xilinx . UltraFast Design Methodology VivadoTM Design Suite Guidelines For Predictable Success © Copyright 2013 Xilinx . Vivado Enables Design Methodology Key Technology: Shared, Scalable Data Model Progressive estimation accuracy across the entire flow Reduce iterations late in the cycle Estimation IP Integration RTL Design Synthesis Place & Route Shared, Scalable Data Model Shares design information between implementation steps – Ensures fast convergence and timing closure Enables use of the same commands & reports to analyze design at every step Schematics Code Changes Tool Settings Placement Edits Timing Path #1 Timing Path #2 Timing Path #3 Highly efficient memory utilization – Scalable for next decade of designs © Copyright 2013 Xilinx . Placement entity FIR is port (clk : in rst : in din : in Timing Report Enables cross-probing Page 11 RTL Reports Technique for Rapid Timing Closure Baselining Prioritize and close 1 step at a time – Converge first at Synthesis (faster, higher impact), then in back-end – Start with the simplest (baseline) constraint: • Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times • Define proper clock dependencies – Make sure the design & constraints are reasonable – Analyze, get to root cause, then decide how to fix it • Clock path vs. data path vs. interconnect delay vs. logic delay… – Add I/O constraints (with Vivado XDC templates) and redo… Do not confuse with “Signoff” Constraints – You still want complete constraints View QuickTime Video for UltraFast Design Methodology for Timing Closure Page 12 © Copyright 2013 Xilinx . Progressive Approach to Design Closure Synthesis Synthesis Synthesis • Analysis • Analysis • Analysis Baseline Constraints If needed Add I/O Constraints Add Timing Exceptions and/or Floorplan Route Place Route Place Route Place • Analysis • Analysis • Analysis • Analysis • Analysis • Analysis Optimize Internal Paths Optimize Entire Chip Fine-tune Fmax Fmax Fmax Baseline XDC Page 13 Complete XDC © Copyright 2013 Xilinx . Final XDC Critical Path could be a Moving Target Example from a Real Design Post-synthesis estimates (the real problem) – Worst path: 13 levels of logic worst path: 4.3ns Post-place – Worst path: 7 levels – Paths with 7-13 levels got placed locally worst path: 4.2ns Post-route (the side-effect of the real problem) – Worst Path: 4 levels of logic – Paths with 5-13 levels got preferred routing worst path: 4.1ns Analyze & Fix timing issues at early stages for faster timing convergence Page 14 © Copyright 2013 Xilinx . Writing HDL CodeSuite that Best Vivado Design Fits the Hardware © Copyright 2013 Xilinx . Impact of HDL Coding Style Block inference – Follow recommended templates for RAM, DSP, LUTRAM, SRL inference Pipeline your design to reduce levels of logic Think about Reset – Taxes routing not always needed: Xilinx devices boot in a known state – Dedicated shifters (SRLs) and RAM memory arrays don’t use resets Synchronous resets are preferred – Allows packing of registers into dedicated RAM and DSP blocks – Tools have the option to implement reset in datapath (LUT) Give more freedom to Synthesis – Revisit attributes needed by other synthesis engines or older releases – Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes… Review Design Creation Chapter in UG949 Review Design Creation tab in the Design Methodology Checklist Page 16 © Copyright 2013 Xilinx . Using HDL Language Templates Accessing templates in IDE – Windows Language Templates Synthesis Templates – BRAM, LUTRAM, ROM, SRL – Counter, MULT – FSM, Decoder, Encoder – … Page 17 © Copyright 2013 Xilinx . Coding to Match the Hardware DSP48 Blocks and BRAM Blocks Leverage DSP block cascading capabilities in Pipelined adder chain delivers optimal performance Adder tree becomes a performance bottleneck in out DSP48 DSP48 DSP48 DSP48 out Avoid Block RAM collision avoidance logic(*) Synthesis assumes collision rdaddr wraddr din (*): logic added by default by Synplify (attribute syn_no_rw_check removes the logic) Page 18 RAMB rdaddr wraddr din dout = © Copyright 2013 Xilinx . RAMB Inference with collision check disabled dout The Impact of Resets Increase performance with the right reset choice – Think Local, not Global with resets – No reset at all (if possible) is best – Synchronous rather than asynchronous reset – Active HIGH rather than active low reset – Default register value can be controlled via the INIT property or at signal declaration in RTL From: UG949 Chapter 4 Design Creation – Control Signals and Control Sets Page 19 © Copyright 2013 Xilinx . Reset Routing Resets compete for the same resources as the rest of the active signals of the design – Including the critical datapath paths Designs that minimize or eliminate resets have – – – – – About 18% fewer timing paths on average About 15% less runtime on average 10% fewer registers and 7% fewer LUTs 20% lower timing scores Use less memory Be selective with where you code resets Initialize all registers in the VHDL / Verilog code Page 20 © Copyright 2013 Xilinx . More on Resets Many designs need some resets – Very few designs require resets on all registers • Most ASICs require a described reset on every register for testability • But the FPGA has a built-in Global Set/Reset (GSR) Guideline: Be selective with where you code resets – Only place resets that have impact on functionality • I/O, State-machines, critical control logic, etc. – Omit resets that do not Initialize all registers in the VHDL / Verilog code – This should be done whether using a reset or not VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := “01010101”; Verilog: reg [7:0] my_register = 8’h55; © Copyright 2013 Xilinx . Gauging Other Design Metrics report_high_fanout_nets – To reduce fanout on a net use… • max_fanout (Vivado synthesis and XST) • syn_maxfan (Synplify) – Use phys_opt_design for timing driven replication From: Design Methodology Checklist – Design Creation tab Page 22 © Copyright 2013 Xilinx . Gauging Other Design Metrics report_control_sets – Indicator of possible packing fragmentation and fitting issues – Run the –verbose option to generate a full list – Use Synplify’s syn_reduce_controlset_size attribute for control Default is 2, set it to 8 to eliminate most lowest fanout control sets From: Design Methodology Checklist – Design Creation tab Page 23 © Copyright 2013 Xilinx . Methodology DRCs Two new rule decks in 2013.3 – methodology_checks – timing_checks Usage: – report_drc –ruledeck methodology_checks – report_drc –ruledeck timing_checks – Specific “methodology_checks” available only for the elaborated design Tools →Report → Report DRCs Page 24 © Copyright 2013 Xilinx . Review and Resolve Critical Warnings Vivado does not stop for Critical Warnings – Enables fixing many issues at once – Bitstream generation will error with unresolved critical warnings From: UG949 Chapter 5 Implementation – Moving past Synthesis Page 25 © Copyright 2013 Xilinx . Review and Resolve Critical Warnings Critical warnings are serious design issues – Invalid constraints or XDC syntax errors – Path segmentation – Netlist or target objects not found or invalid Address these warnings before moving forward – Results of design analysis may be inaccurate – Critical Warnings may prevent design success Page 26 © Copyright 2013 Xilinx . Timing Constraints Creation Vivado Design Suite and Validation © Copyright 2013 Xilinx . Timing Constraints Need to Be "Clean" When constraints (clock, IO) are missing – The corresponding paths are timed optimistically – No violation will be reported but design may not work on HW When path are incorrectly constrained – Runtime and optimization efforts will be spent on the wrong paths – Reported timing violations may not result in any issues on HW When constraints create wrong HOLD violations – May result in long runtime and SETUP violations – P&R fixes HOLD violations as #1 priority, because: • Designs with HOLD violations won’t work on HW • Designs with SETUP violations will work, but slower Review the Creating Constraints section of the Design Creation Chapter in UG949 & checklist Page 28 © Copyright 2013 Xilinx . Include IP Constraints Many cores have their own constraints / exceptions – PCIE, MIG, RAM-based asynchronous FIFOs… Non-native IP: Be careful! – Very easy to drop the IP constraints especially if provided as .ngc files Native IP: Constraints included – Sources window in IDE: Compile Order Constraints – Use report_compile_order –constraints to identify constraint file sources Page 29 © Copyright 2013 Xilinx . Method to Create Good Constraints Create clocks and define clock interactions – Four-step guideline Set input and output delays – Beware of creating incorrect HOLD violations Set timing exceptions – Less is more! – Beware of creating incorrect HOLD violations Use report commands to validate each step Page 30 © Copyright 2013 Xilinx . Clock Ground Rules For SDC-based timers, clocks only exist if you create them – Use create_clock for primary clocks Clocks propagate automatically through clocking modules – MMCM and PLL output clocks are automatically generated – Gigabit transceivers are not supported. Create them manually. don’t create_clock here create_clock here Use create_generated_clock for internal clocks (if needed) All inter-clock paths are evaluated by default Page 31 © Copyright 2013 Xilinx . Four Steps for Creating Clocks Run report_timing_summary before starting constraint capture – View report_clocks section to see all signals driving clock pins Step 1 – Use create_clock for all primary clocks on top level ports – Run the design (synthesis) or open netlist design Step 2 – Run report_clocks – Study the report to verify period, phase and propagation – Apply corrections to your constraints (if needed) Attributes P: Propagated G: Generated Clock sys_clk pll0/clkfbout pll0/clkout0 pll0/clkout1 Page 32 Period 10.000 10.000 2.500 10.000 Waveform {0.000 5.000} {0.000 5.000} {0.000 1.250} {0.000 5.000} Attributes P P,G P,G P,G Sources {sys_clk} {pll0/plle2_adv_inst/CLKFBOUT} {pll0/plle2_adv_inst/CLKOUT0} {pll0/plle2_adv_inst/CLKOUT1} Output of report_clocks (excerpt) © Copyright 2013 Xilinx . Four Steps for Creating Clocks (continued) Step 3 – Evaluate the clock interaction using report_clock_interaction BEWARE: All inter-clock paths are constrained by default! – Mark inter-clock paths (Clock Domain Crossing) as asynchronous • Make sure you designed proper CDC synchronizers • Use set_clock_groups (preferred method to set_false_path) BEWARE: This overrides any set_max_delay constraints! – Do you have unconstrained objects? • Find out with check_timing Step 4 – Run report_clock_networks – You want the design to have clean clock lines without logic • Tip: Use clock gating option in synthesis to remove LUTs on the clock line Page 33 © Copyright 2013 Xilinx . Defining & Validating Clock Interactions Page 34 © Copyright 2013 Xilinx . Constraining Cross Clock Domains Use appropriate synchronizing techniques – 2 or more register stages, for single bit – FIFO for buses Maximize MTBF – ASYNC_REG to place synchronizing flops in the same slice for best Mean Time Between Failures (MTBF) set_property ASYNC_REG TRUE \ [get_cells [list sync0_reg sync1_reg]] Page 35 © Copyright 2013 Xilinx . Constraints for Asynchronous CDC Ignoring timing paths between individual clocks set_clock_groups –asynchronous –group {clk1} –group {clk2} This is equivalent to: set_false_path –from [get_clocks clk1] –to [get_clocks clk2] set_false_path –from [get_clocks clk2] –to [get_clocks clk1] BEWARE: This overrides any set_max_delay constraints! Ignoring timing paths between groups of clocks # SDC create_clock for the two primary clocks create_clock -name clk_oxo -period 10 [get_ports clk_oxo] create_clock -name clk_core -period 10 [get_ports clk_core] # Set Asynchronous Clock Groups set_clock_groups -asynchronous -group [get_clocks –include_generated_clocks clk_oxo] \ -group [get_clocks –include_generated_clocks clk_core} ] BEWARE: This overrides any set_max_delay constraints! Page 36 © Copyright 2013 Xilinx . Setting Input / Output Delays Start with no IO constraints – Focus on finding and fixing core timing issues – Vivado does not time from IOs without IO constraints • No Need to false_path –from or –to get_ports to ignore IO timing Specify realistic IO delays Once Core Timing Reasonable – Use set_input_delay and set_output_delay – Wrong delay value (e.g. <0 ns) can cause invalid analysis The delay value specified is the external delay – Default in UCF: internal delay Page 37 © Copyright 2013 Xilinx . Multicycle Paths set_multicycle_path N implies a HOLD check at N-1 – E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles! Whenever setup check is changed, hold check is also changed Guidelines for proper multicycle path constraints – Should always be pairs of set_multicycle_path constraints • One for –setup and one for –hold – Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations regA D CE Q Multicycle Path = 3T regB D CE regA/CLK Q HOLD SETUP regB/CLK CLK REGB/D set_multicycle_path –from [get_cells regA] –to [get_cells regB] 3 -setup set_multicycle_path –from [get_cells regA] –to [get_cells regB] 2 –hold hold checked at edge 3-1-2 = 0 Page 38 © Copyright 2013 Xilinx . Using Vivado Language Templates XDC Template Accessing templates in IDE – Windows Language Templates SDR & DDR Templates – Inputs and outputs – Source / System synchronous – Center / Edge aligned Page 39 © Copyright 2013 Xilinx . Reading the Reports Reading the report_timing_summary – Intra-clock report – Inter-clock report Use report_timing for interactivity and advanced options – You would typically use it in the TCL window • report_timing –through [get_nets {/cpu_top/crit_net_name}] • report_timing –setup –max_paths 10 # For 10 worst setup paths • report_timing –hold –to [get_cells {/top/item}] # Hold on “item” – Use filters from your XDC files to check each expression • set_multicycle_path –from [get_pins regA/C] –to [get_pins regB/D] • report_timing –from [get_pins regA/C] –to [get_pins regB/D] Page 40 © Copyright 2013 Xilinx . Timing Command Summary Obtain full timing summary of the design – report_timing_summary: summary subsections for all timing checks Create and validate clocks – check_timing: for missing clocks and IO constraints – report_clocks: check frequency and phase – report_clock_networks: possible clock root Validate clock groups – report_clock_interaction Validate I/O delays – report_timing –from [input_port] –setup/-hold – report_timing –to [output_port] –setup/-hold Add exceptions if necessary – Validate using report_timing Page 41 © Copyright 2013 Xilinx . Managing Constraint Files Using a single XDC file – XDC apply to both synthesis & implementation Using multiple XDC files – Main XDC with top level constraints • Primary clocks and I/O delays • Exceptions on clocks and RTL objects – Implementation specific XDC • Physical constraints main.xdc Elaboration • Exceptions based on physical netlist Synthesis The order of constraint files matters! – To report the order of XDC files: report_compile_order –constraints Page 42 © Copyright 2013 Xilinx . impl.xdc Implementation Managing IP Constraint Files Some IP come with their own XDC constraints – Example: The clocking wizard The clocking wizard XDC will be read before the user XDC by default (user constraints can override IP defined clocks by default) The order of constraint files matters! – To report the order of XDC files: report_compile_order –constraints – Always verify the clocks using report_clocks (step 2 of 4-step process) – To change the default processing order set_property set_processing_order early|late IP_XDC_File – If necessary, IP_XDC_files can be enabled/disabled Page 43 © Copyright 2013 Xilinx . Clock Planning, Planning Vivado DesignPin Suite and Floorplanning © Copyright 2013 Xilinx . Clock and Pin Planning Pin and Clock Planning often happens early in the Project – Decisions here can have prolific effects throughout the design • Excessive clock skew • Poor I/O timing • Timing hazardous clock domain crossing • Less flexible logic placement • Fewer clocking resource choices • Excessive routing delays • Reduced device utilization Pin and Clock Planning should be considered together – Choices made for clock pins affect clocking timing and resources choices – Choices made for data pins affect clock pin placement decisions Review the Board & Device Planning Chapter in UG949 Review the Board and FPGA Planning tab in the Design Methodology Checklist Page 45 © Copyright 2013 Xilinx . Clock and Pin Planning Considerations for clock pin planning – Generate all I/O interface and clocking IP prior to pin assignment – Consolidate clocking where possible and consolidate MMCMs • Fewer clocks and MMCM means fewer clock resources and crossings – Consider all CDC when assigning clocking resource and pins Considerations for data pin planning – Group related data pins in same bank, or adjacent banks if single bank not possible • Place associated I/O clock in same bank when possible – Consider associated control signal placement along with data paths – Consider data flow as planning pinout • Chose a pinout that has clean passage through device – Place high fanout signals towards the middle of the chip • Really high fanout signals considered for CCIO pins with BUFG resources – Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement Page 46 © Copyright 2013 Xilinx . Clock and Pin Planning Use Vivado Pin Planning capabilities – Import pin & clocking assignments from generated IP – Visualization of I/O resource placement on package and in device – DRC, SSN and other checks available to validate choices – Configuration pin assignments & possible device migration considerations Re-evaluate in Vivado any subsequent pin changes – Understand how PCB pin swaps affect timing & resources Vivado I/O & Clock Planning Tutorial UG935 – Available in DocNav and Vivado Page 47 © Copyright 2013 Xilinx . Additional Considerations for SSI Devices Clocking – High fanout clocks should be placed in center SLRs – Place regional clocks on center clock region within an SLR – Place clock pin / MMCMs in same SLR as timing critical I/O interfaces (avoid driving timing critical I/O interfaces from a different SLR) – Clock pin choices should be balanced across upper & lower SLR: • 2 upper SLR clock domains have 8 BUFG x 2 • 4 lower SLR clock domains have 4 BUFG x 4 Pinout – High fanout signals feeding all SLRs placed in center SLRs – I/O interfaces should not span across SLRs – Pay attention to data flow across SLRs • Avoid the need for multiple SLR crossings due to pinout decisions For more details Consult UG872: Large FPGA Methodology Guide for more details Page 48 © Copyright 2013 Xilinx . Improving Placement Through Floorplanning First improve HDL, synthesis & constraints – Easier, more repeatable to not floorplan when avoidable Start design without any floorplanning – See what P&R algorithms can do without restrictions Using Vivado IDE – Highlight placement per module as guideline – Visualize placement of critical timing paths • Understand data flow in & out of Pblocks • Understand affects of Pblock inside & out • Resources around placement can affect data flow – Create Pblocks minding resource utilization Careful not to over floorplan – Less is best – Only floorplan the critical areas of the design – Do not create Pblocks with very high utilization • Can create routing congestion or new timing problems – Avoid overlapping Pblocks • Creates more complex placement and clock scenarios Page 49 © Copyright 2013 Xilinx . Baseline run with highlighted regions Summary Vivado Design Suite © Copyright 2013 Xilinx . UltraFastTM Methodology Review For optimal results, adapt your HDL style to the FPGA – Be mindful of BRAM, LUTRAM, DSP, SRL inference needs – Avoid asynchronous reset and wired resets in general – Minimize control signals – For large FPGAs, design with the dataflow and floorplanning in mind Baseline your constraints to converge rapidly Provide clean timing constraints – Bad constraints results in bad runtime, performance and HW failures – Learn the essentials of timing creation & validation methods Follow pin/clock planning guidelines – Must follow dataflow – Place large fanout clocks and pins in the center of SSIT devices Page 51 © Copyright 2013 Xilinx . Follow Xilinx facebook.com/XilinxInc twitter.com/XilinxInc © Copyright 2013 Xilinx . youtube.com/XilinxInc Thank YouDesign Suite Vivado © Copyright 2013 Xilinx .