Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham Young University French 207 Slide 1 MAPLD 2005 FPGA Power Trends & Needs • • • • • Number of logic blocks & maximum operating frequency track Moore’s Law Voltage reduction is slower Resulting power increase is exponential Power needs to be a first class design constraint Limited power tools available – Spreadsheets 1,000,000 Number of F-F’s • Manual entry • Prone to guess-timation 100,000 Power (mW) – XPower (post-routing) 10,000 • At end of design cycle • Profiled after timing simulation 1,000 • Time intensive • Unwieldy file sizes 100 • Limited Reporting Clocking Frequency (MHz) 10 Voltage (V) • Only total power consumed • No ability to capture power transients 1 • Limited design path if specifications not met • Routing tools optimize only throughput French 207 Internal Power Consumption Virtex Virtex-E Virtex-II Virtex-II Pro Virtex 4 LX Xilinx Family Power calculated assuming 80% device utilization, 80% peak clock frequency, 12.5% toggling rate. Internal logic only, no I/O. Slide 2 MAPLD 2005 Power Tools: Goals • Push power analysis, visualization, and optimization to front of the tools chain: – Analyze power consumption at logic simulation with two levels of accuracy • Pre-place-and-route, using heuristic estimates based on fanout • Back-annotated with precise post-placeand-route RC data – Visualize by providing intuitive views to help the designer rapidly find and correct inefficient circuits, operating modes, data patterns, etc. – Optimize systems by automatically identifying problem paths and suggesting improvements FPGA Tool Flow • Benefits – – – – Closer to logical level and design entry Power profiling during functional simulation Early estimation before place and route Automatic specific resource utilization power details – Facilitates high level design alternative exploration French 207 Slide 3 Proposed Power Tool Entry Point Current Power Tool Entry Point MAPLD 2005 Tool Backbone: JHDL & EDIF Parser • Leverage JHDL simulation Environment with EDIF Parser circuit manipulation • JHDL – – – – Java-based structural design tool for FPGAs Circuits described by creating Java Classes Design libraries provided for several FPGA families http://www.jhdl.org • JHDL design aides – Logic simulator & waveform viewer – Circuit schematic & hierarchy browser – Module Generators • Circuit designer does not need to know Java! • EDIF Parser – – – – – – Supports multiple EDIF files Virtex2 libraries and memory initialization Support for “black boxes” No JHDL wrapper required http://splish.ee.byu.edu/reliability/edif/ Verified: Synplicity, Synplcity Pro, Coregen, System Generator, Chipscope French 207 Slide 4 3rd Party Tools JHDL Environment JHDL Data Structure EDIF Netlist EDIF Parser EDIF Parser EDIF Data Structure Manipulation Tools MAPLD 2005 Power Tool Flow: Timing-Level .ncd VHDL Verilog JHDL Synthesis .ncd Place & Route Map Source Code Xilinx Tool Flow To Target Xpower Bitgen .pwr EDIF Routed Circuit Model EDIF Parser • Event Model Restructured JHDL – Tool Interoperability – Cross-probing Enabled Power Analysis & Visualization Power Tools • Support dynamic insertion of 3rd party (Power) tools – Circuit APIs in place – Graphical User Interfaces (GUI) support French 207 Slide 5 MAPLD 2005 Power Visualization Tool • Two views: – – • Integrated “cross-probing” with existing JHDL tools – – – • • • Instantaneous vs. cumulative power consumption over time Sorted tree view of “worst offenders” Unified Environment Allows Experimentation Smart Re-use of CPU Memory Help rapidly identify inefficient circuits and operating modes Per-cell / per-bit granularity Simulation trigger on power specification Cross Probing French 207 Slide 6 MAPLD 2005 Post Synthesis Level Power Modeling • Power Modeling – Quiescent power based on total circuit size – Dynamic Power Power (%toggle)( FreqClock )(CapComponent CapWire) • Toggle Rates (Data Dependant) • Components Used • Routing Interconnect – Actual quiescent and dynamic power not known until circuit is placed and routed • Leverage existing JHDL tool environment Component Cap (pF) Component Cap (pF) FF 1.21 LUT 1.0 SRL 3.0 LD 1.0 INV 1.0 AND 1.0 RAM 1.0 MULT 17.2 DLL 40.0 IBUF 1.0 – Toggling rates derived from simulator BUFG 6.0 BRAM 59.0 • Will lose glitching information – Components known from EDIF or JHDL primitives Xpower Component Capacitance • Component capacitance imported from Interconnect Cap (pF) Xpower – How to model routing interconnect? Long Line 11.8 • Do not have exact routing information at Hex Line 0.59 synthesis Double Line 0.44 • Routing tools can pick different route each iteration Direct Connect 0.29 – Interconnect length and combinations vary Xpower Interconnect Capacitance French 207 Slide 7 MAPLD 2005 Wire Power Model Analysis • Developed power tools to analyze relationships • Can plot capacitance vs – – – – – Fanout Programmable Interconnect Points Wire Length Total Number of Nets Total Number of Components • Which relationships maintain correlation from synthesis to place and route? – Optimizer removes components, nets • Can also use tools to judge routing quality Optimization Candidates – Identify Outliers – Information Available to do Power Weighted Placement and Routing • Use Placement Macros in JHDL • Use UCF placement and/or timing constraints French 207 Slide 8 MAPLD 2005 Low Fanout Capacitance Variance • Not all routes are created Equal • Up to 60% variance on “same” route length • East-West vs NorthSouth Bias • Switches sometimes use Doubles instead of Direct Connects Switch Logic 2.45 pF (#2727) 2.37 pF (#4791) YQ -> F2 (omux-B3) YQ -> G4 (omux-B4) 1.46 pF (#2768) YQ -> F4 (omux-A2) Direct vs Double Direct Connect French 207 0.75 pF (#131) Double Wire YQ -> F2 (omux-A7) Slide 9 MAPLD 2005 Capacitance vs Fanout • Fanout model well correlated • Secondary fit line corresponds to Macros • High variance at low fanout • Achieving 4.3% average error, 16% variance • Explored device utilization models as well French 207 Placement Macros Slide 10 MAPLD 2005 Resulting Power Tool Flow .ncd Map Source Code VHDL Verilog JHDL Synthesis Xilinx Tool Flow Bitgen .pwr EDIF Virtex II Power Model EDIF Parser JHDL French 207 .ncd Place & Route Xpower To Target Routed Circuit Model Power Analysis & Visualization Slide 11 Power Tools MAPLD 2005 Power Optimization Approach • Influence Xilinx Place&Route tools for power efficiency – Minimize clock/wire lengths of high power nets Timing Constraint (ns) • Use power analysis tools to identify hot-spots and generate constraints – Timing constraints on non-clock signals – Location constraints on sink flip-flops of clock signals • Verify power optimization approaches Placement Constraint (X,Y) – Use final circuit timing model to verify power savings .ncd Xilinx Tool Flow Ngdbuild .ncd & Map Place & Route .ucf EDIF EDIF Parser Xpower vcd vhd Power Tools ModelSim Tool Verification Verification Optimization French 207 bitgen Slide 12 MAPLD 2005 Timing Constraint Power Optimization • Wire power is optimized by reducing length – MAXDELAY constraint in UCF file defines the maximum latency a wire has • Power tools contain Wire Table database – Sortable by: Average power, Toggling rate, Fanout, Load – Apply constraints Default Constraints Wire Table Constraint Freq : 50 MHz Operating Freq : 50 MHz Poor Power Efficiency French 207 Power Timing Constraints Constraint Freq : 100 MHz Operating Freq : 50 MHz Better Power Efficiency Slide 13 MAPLD 2005 Timing Constraint Power Optimization: Preliminary Results French 207 % of total nets constrained Clock (mW) Signal (mW) Total Power (mW) Clock + Signal Baseline, no constraints N/A 442.5 19.9 462.4 All nets constrained 12.5% 439.3 29.4 468.7 (-1.4%) Fanout < 10 constrained 11.1% 394.2 23.7 417.9 (9.6%) Fanout < 4 constrained 10.6% 400.6 23.1 423.7 (8.4%) Top 25% constrained 4.1% 384.5 23.4 407.9 (11.8%) Power is reduced by from –1.4% to 11.8% More constraints are not necessarily better Can also vary amount of timing that nets are constrained by Circuits still meet original timing specification requirements Slide 14 MAPLD 2005 Location Constraint Power Optimization • Power Optimization Guidelines Less Power Efficient More Power Efficient – Minimize clock zone utilization – Group flip-flops as tightly as possible – Group flip-flops closer to clock trunks Reduce clock paths by putting constraints on flip-flops locations, thus reducing the clock capacitance and power. French 207 Slide 15 MAPLD 2005 Location Constraint Power Optimization Interface • Clock table can be sorted by power, number of flip-flops etc. • Users can select locations of flip-flops - Users can select how Clock Table tightly flip-flops are placed - Users can define the area where flip-flops are placed The tool checks the validity of constraint areas. - Users can select which flip-flop groups are added with the constraints French 207 Slide 16 MAPLD 2005 Location Constraint Power Optimization Preliminary Results Clock (mW) Signal (mW) Logic (mW) Total Power (mW) Clock + Signal + Logic Baseline, no constraints 442.5 19.9 285.8 748.2 All FFs Placed 293.7 (33.6%) 27.6 (-38.8%) 255.4 (10.6%) 576.7 (22.9%) IOs in IOBs, all other FFs placed 356,251 (19.5%) 21,909 (-10%) 285,787 (0%) 663,947 (11.3%) - Individual clock net improvement ranged from -4% to 57% - Achieve up to 22.9% total power improvement - Circuits still meet timing requirement if IO buffer flip-flops are left in IOBs - Power could be further reduced if IO buffer flip-flops are not constrained to be within IOBs French 207 Slide 17 Unconstrained Constrained MAPLD 2005 Conclusions • Post-synthesis level power modeling is feasible – Some accuracy trade-offs inevitable – Quicker power results enable • Capability to determine power specifications early in the design flow • Feedback on design-level circuit power ramifications • Tighter feedback loop to designer for more design iterations • Optimization – Preliminary results encouraging – Tools do not alter original circuit functionality & use COTS inputs – Developing optimization algorithms & routines • Tools are open source: http://rhino.east.isi.edu • This research made possible by a grant from the NASA Earth-Sun System Technology Office French 207 Slide 18 MAPLD 2005