Chungki Oh, JungYun Choi, Hyo-Sig Won, Kee Sup Kim Design Technology Team System LSI Division Samsung Electronics Jeongwon Kang, Kamlesh Madheshiya, Arti Dwivedi Ansys Apache Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do, Mobile SoC Design Trend Challenges in SoC Power Analysis Power Critical Signal Flow in RTL/Gate power analysis Summary The design size of mobile SoC has been increasing at a rapid speed Fierce competition in mobile market has driven SoC design to provide high performance and numerous functionality, which was only previously available in PC and laptop To meet the power wall of mobile design and leverage the additional capacity in silicon processing scaling, multiple cores and parallelism are popular in current SoC design SoC consumer portable design complexity trends - ITRS, 2011 edition The era of billion gate SoC design put significant challenges for power analysis Simulation is needed to analyze the dynamic power accurately. However, for billion gate SoC, the simulation runtime is becoming too long for reasonable design cycle The simulation waveform generated from simulation can occupy more than hundreds of GigaBytes, which puts significant burden on power analysis tools to deal with. 10’s of modes .... Video streaming GPS + Voice Call Web + Email Millions of clocks Basic concept of RTL power estimation Inputs: RTL-coded design, power library, capacitance model, activity file 1. Elaborate: RTL design is compiled and elaborated into an interconnection of primitive gates 2. Calculate Power: Design is mapped to the target technology and average/timebased power analysis is performed based on switching activity Capacitance model Power Library (.lib) RTL (Verilog/VHDL) PowerArtist Elaborate Micro-architectural Inferred netlist Verilog Simulation Activity File (.vcd/.fsdb/.saif) Calculate Power RTL power report To obtain reasonable accuracy, simulation is needed for vector-based power estimation Generate a significantly smaller power-critical-signals-only FSDB from the Emulator/Simulator RTL Test Bench RTL PowerArtist Power-Critical Signal Extraction Verilog Simulation Critical Signal List Test Bench Verilog Simulation Full FSDB Partial FSDB initial befin $fsdbDumpfile(“pa_extracted.fsdb”); $fsdbDumpvarsByFile(“sig_file_name”); end testbench.top_inst.temp_out testbench.top_inst.temp testbench.top_inst.en testbench.top_inst.out testbench.top_inst.clk testbench.top_inst.inC testbench.top_inst.inB testbench.top_inst.inA Optimized for power analysis over entire simulation duration Apache PowerArtist Identify Power-Critical Signals Power Analysis + Debug L1 Simulator/Emulator Reduced FSDB Optimized for functional debug over limited clock cycles Functional Debug Tools Identify Function-Critical Signals Functional Debug L2 Reduced FSDB Simulator/Emulator Power-critical signals Activity for only a subset of signals is necessary for accurate power estimation Critical signals consists of signals such as sequential and module in/out ports Non-critical signals Activity propagation can be performed for the remaining signals based-on activity propagation formulae of various cell types PI & PO IO cells Latches ICGCs Flip-Flops MUX Application Power-critical signals can be extracted for both RTL and gate-level designs Critical signals can be utilized in simulation as well as emulation flows Impact Activity file dumped only for power-critical signals saves simulator/emulator and power analysis runtime and memory resource with small error in power analysis Power-critical signal flow enables power analysis of huge design for which power estimation used to be unrealizable Wire Load Model Power Library RT/Gate-level design Elaborate Crit. Sig. Extraction PowerArtist Micro-architecturally Inferred netlist Crit. sig. list Simulation/Emulation Partially dumped Activity File Calculate Power RTL Power Report Test Bench Experimental result with Design-A in RTL The first experiment was done with a multimedia codec IP design Design size is about 8 Million Gates, with 32nm library CPU time Impact on CPU time Impact on memory resource & power result Experimental result with Design-B in RTL The second experiment was done with quad-core CPU block Design size is Tens of Million Gates, with 32nm library Impact on memory resource & power result CPU time [hr] Impact on CPU time 117 12 24 14 Experimental result with Design-A in Gate-level The third experiment was done with same design as the first one but in gate-level Design size is about 8 Million Gates, with 32nm library CPU time Impact on CPU time Impact on memory resource & power result In the era of billion gate SoC chip design, the runtime and generated waveform database size are challenging issues for accurate power estimation. To solve this challenge, we have proposed to use a subset of the full signal list in the design when dumping the waveform. We have introduced the methodology on how to choose this signal subset for good power correlation while keep this signal subset small enough. The PowerArtist power critical signal flow has been verified by extensive experiments covering both RTL and gate-level power estimation flows. Our experimental results show that critical signal flow cut the runtime by 70-80%, simulation waveform size by 60-97%, while keeping the power correlation within less 10% mismatch.