POWER — USAGE SHIFT LEADS TO METHODOLOGY SHIFT V E R I F I C A T I O N W H I T E P A P E R VIJAY CHOBISA, PRODUCT MARKETING MANAGER, MENTOR GRAPHICS GAURAV SAHARAWAT, STAFF ENGINEER RUNTIME R&D, MENTOR GRAPHICS w w w . m e n t o r . c o m Power — usage shift leads to methodology shift PROBLEM Why has power suddenly become a high visibility topic? One reason is that power exploration and accurate power calculation of SoCs in the target application environment is getting executive attention due to the fact that companies are missing market windows because of power issues. Power issues are caused because of a usage shift in mobile computing devices (phones, tablets, etc.). These devices are now being used for playing games, watching movies or NFL or NBA games in addition to typical cell phone usage. This usage shift warrants a methodology shift in the power analysis flow. Why are existing power analysis flows and methodologies falling short? First, FINFET is a technology that helps reduce static power, but dynamic power is still the largest power consumer as chips are getting faster and the screens in today’s mobile devices are being produced with much higher resolution. Second, adapted functional testbenches, traditionally used for power number generation, are not adequate for analyzing power problems. Finally, power analysis tools are not designed to do power measurements at the full chip/system level while running live software applications. Indeed, power analysis is becoming equally or more important as functional verification for today’s power hungry SoCs. Figure 1: Dynamic power is still a big focus. w w w. m e nto r. co m 2 of 9 Power — usage shift leads to methodology shift FUNCTIONAL TESTBENCH VS. LIVE APPLICATION If you look at existing flows for power measurement, you quickly realize that chip designers are making lots of assumptions when it comes to generating power numbers for SoCs. They run simulations of functional tests at the block or subsystem level, generating switching activity in the form of Switching Activity Interchange Format (SAIF), FSDB or VCD for a limited number (10s of 1000s) of cycles. Then they supply this data to power analysis tools with a technology library for power number generation. Once they have the power numbers from these adapted functional tests, extrapolation techniques are used to come up with a power number for the full SoC. However, many times these extrapolated power numbers are quite off from real power numbers measured once the chips are back in the lab. Over the last year, Mentor Graphics has worked with leading fabless chip design companies to establish an emulation flow to generate accurate power numbers. We do this by measuring power in a targeted application environment while running actual software applications. This includes booting an OS and then running hundreds of millions of cycles to locate areas of concern when it comes to power. The Veloce emulation system not only has capacity to handle very large SoCs (up to 2 billion ASIC gates), but also has the performance required to boot an OS, run real applications and generate switching data. In addition, Veloce provides complete visibility of every design node, a must-have capability for accurate power analysis. We believe it’s clear that, when it comes to generating the most accurate power number, it’s best to use the platform and methodology that allows for analyzing power of an SoC in targeted application environment at the system level. Figure 2 illustrates the need for analyzing power while running live SW applications. Figure 2: Benefit of running the live application. w w w. m e nto r. co m 3 of 9 Power — usage shift leads to methodology shift TRADITIONAL FILE-BASED FLOW For a traditional file-based flow, Veloce is used to generate switching activity (SAIF) over long emulation run. The data is then used as an input to power analysis tools for generating average power numbers. SAIF-based flows are quite common among customers to do average power estimation; however such flows do not have temporal dependency information as they do not store the full, time-based waveform for all design states. This gap can impact the accuracy of average power for memories or IP, where the calculation is generally more complicated than just considering cumulative switching. Many times it is important to know what portion of activity has occurred during the period its Key ENABLE signals are asserted for improved accuracy of average power. Figure 3: File-based power analysis flow. To facilitate the capture of this conditional and segmented switching, Veloce supports the more elaborate version of SAIF known as “Forward SAIF,” which can capture all the interesting conditions for switching activity. This is in principle a State and Path Dependent (SDPD) SAIF File for all library cells ports in addition to normal SAIF activity for all design nodes, which improves the accuracy of the average power. However, the accuracy still depends upon the user-supplied Forward SAIF, which is time consuming and difficult to capture as it requires in-depth knowledge of the design. The best, most accurate results come from waveforms that give full timing information. FSDB or VCD files are required at times for a variety of other reasons as well, such as for certain power tools that guide users about possible power optimizations. However, using FSDB or VCD files for emulation has not been effective solution given the files’ structure, large read/ write times, disk footprint and lengthy generation times. The main reason the FSDB-based flow is slow is because of the way FSDB organizes the data (signal-based storage) and the way power tools access the data (time-based access). This mismatch in data orientation w w w. m e nto r. co m 4 of 9 Power — usage shift leads to methodology shift causes performance loss when both reading and writing FSDB-based flows. Veloce inherently processes and generates data in a manner that is suitable for time-based access and hence improves the scope of performance and efficiency, and also allows for a more direct, tighter integration with power tools. Ultimately there are several reasons the traditional file-based power analysis flow is very limited and restrictive for exploring and analyzing power at the SoC level. First, power tools are not designed to handle large files generated by emulation. Second (and related), it takes an unacceptably long time to generate meaningful power numbers with this flow. The time it would take to create and read the file to do a detailed power analysis makes file based power analysis flow impractical at the SoC level. What is best way to avoid these big files and focus on a detailed power analysis of those design regions, logic blocks, and applications causing high switching while running the real software applications? Veloce’s Power Application software delivers advanced methodology enabling chip designers to identify power concerns while running system level tests and then seamlessly capture detailed information for focused power analysis. The next section gives more information on the complete flow, including the ‘Veloce Activity Plot’ and ‘Dynamic Read Waveform API’ based customized integration with a power analysis tool from Ansys. Figure 4: Veloce forward SAIF flow. VELOCE ACTIVITY PLOT Veloce Activity Plot, shown in Figure 5, is a distinctive capability that allows a power analysis team to run long test sequences and quickly isolate high switching regions over long emulation runs; these regions represent actual power concerns. This enables customers to run real software applications, identify areas of interest when it comes to power and then narrow down those application/logic blocks causing peak switching. It’s possible to view an Activity Plot of the full or partial design, and thus to analyze the activity trends of the design that are directly proportional to the power consumption pattern. Veloce can produce an Activity Plot an order of magnitude faster compared to file-based power charts. For an example, Veloce takes 15 minutes to generate an Activity Plot of a 100 million gate design for 75 million design clock cycles. Power analysis tools will probably take more than a week to generate similar information and might not even be able to handle such a large volume of data. Veloce Activity Plot provides an activity view for the entire design scope, including IPs and sub-hierarchies, all within targeted time windows of interest. w w w. m e nto r. co m 5 of 9 Power — usage shift leads to methodology shift Figure 5: The Veloce Activity Plot identifies focus areas over long runs.. Once you identify high switching activity regions at the top level of your design, then you can analyze the various sub-blocks or applications that are the main source of this high switching. Now you can capture this time zone information in a tzf (Time Zone File) file and input this to Veloce to generate complete data for the selected time windows for detailed power analysis. Figure 6: The activity plot is the basis for extracting and generating detailed power numbers. w w w. m e nto r. co m 6 of 9 Power — usage shift leads to methodology shift DYNAMIC READ WAVEFORM API FLOW Mentor Graphics has worked on a unique and customized integration with an industryleading industry power tool, PowerArtist®. The result is a flow where the power analysis tool is fed with the switching data live while emulation is running. The Dynamic Read Waveform API (DRW-API) approach enables accurate power calculation at the system level, where booting an OS and running software applications is required. This makes it practical to explore power exploration at RTL for power budgeting and tradeoffs, as well as more accurate power analysis and signoff at the gate level in a targeted application environment. The dynamic API-based live streaming exchange of switching data between emulation and power analysis tools allows for all the operations to be run in parallel — emulating the SoC, capturing switching data, reading of the switching data by the power analysis tool and generating power numbers. This brings huge improvement for time to power generation and also delivers improved accuracy compared to SAIF based average flows as conditional controls are incorporated automatically for switching. The Dynamic API streaming enables the power analysis and exploration possible at SoC level with long tests and scenarios that are not possible with a file-based flow. Figure 7: Veloce accelerated power analysis flow. Designers can now meet and verify their power goals in the most reliable and efficient manner by combining the power of two of the industry’s best-in-class two tools. Veloce is the leading emulation platform and PowerArtist is the best in class power estimation solution. There is a natural synergy between the products. Veloce can boot the OS, run live software applications and execute verification cycles to collect design activity over very long runs compared to simulation. PowerArtist can estimate power numbers using design activity captured over long runs. This delivers more accurate power numbers compared to simulation or other probabilistic static methods. Veloce runtime performance and streaming integration with PowerArtist makes it possible to collect power numbers for a variety of test scenarios and functional modes in a reasonably short span of time, and thus enables datadriven decisions about power. w w w. m e nto r. co m 7 of 9 Power — usage shift leads to methodology shift The new, tighter integration improves the time to power productivity. Both tools work on the same data model, transfer switching data in the most optimized method, avoid file write/ read penalties and deliver higher accuracy. In other words, this new approach eliminates network dependency and write/read time spent waiting for a huge volume of transient files. In addition, it aligns compile times of both tools as well as incorporates the native CRITICAL SIGNAL LIST, which further improves performance and ease-of-use. Typically, the critical signal list is about 10-20% of the full design signals and hence using it significantly improves time to power performance by reducing data exchange between the tools. SPEED IMPROVEMENTS FILE BASED VS. DYNAMIC READ WAVEFORM API FLOW Figure 8 illustrates a four-step, File based flow, two-step Dynamic API based flow and initial performance numbers on real customer designs with the Dynamic Read Waveform API vs. a File-based flow. With the Dynamic Read Waveform API flow, File write and read steps are eliminated hence significant performance improvements. Figure 8: Speed improvements: file-based vs. dynamic read waveform API flow. A COMPLETE RTL EXPLORATION AND GATE-LEVEL POWER SIGNOFF FLOW By eliminating a file-based flow and providing the unique dynamic read waveform API integration with power analysis tools, Veloce offers a complete RTL power exploration and accurate gate level power analysis flow. Customers can start RTL level power exploration very early, thus allowing them to do power tradeoffs and make architectural adjustments far upstream in the design cycle. They then can continue to use the flow as the design gets frozen and gate-level representations are prepared for tapeout. At that point, users can focus on more accurate power measurements and do additional fine tuning before tapeout and power signoff. w w w. m e nto r. co m 8 of 9 Power — usage shift leads to methodology shift Figure 9: Veloce provides comprehensive RTL and gate-level power analysis. CONCLUSION Veloce offers a unique and customized flow for SoC power exploration and analysis with a fully integrated solution with PowerArtist from Ansys for both the RTL and gate level. Veloce Power Application is enabling a methodology shift in the way power measurements are done to address the new requirements due to usage shift. Chip designers do not need to rely on functional test benches and extrapolation techniques to come up with power number. The new flow enables booting OS, running live applications and run different functional use modes, scenario for generating accurate power data. For the latest product information, call us or visit: w w w . m e n t o r . c o m ©2015 Mentor Graphics Corporation, all rights reserved. This document contains information that is proprietary to Mentor Graphics Corporation and may be duplicated in whole or in part by the original recipient for internal business purposes only, provided that this entire notice appears in all copies. In accepting this document, the recipient agrees to make every reasonable effort to prevent unauthorized use of this information. All trademarks mentioned in this document are the trademarks of their respective owners. MGC 5-15 TECH12950-w