Critical Signal Flow for Power Estimation: The Road to Billion Gate

advertisement
Chungki Oh,
JungYun Choi, Hyo-Sig Won, Kee Sup Kim
Design Technology Team
System LSI Division
Samsung Electronics
Jeongwon Kang, Kamlesh Madheshiya, Arti Dwivedi
Ansys Apache
Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do,
 Mobile SoC Design Trend
 Challenges in SoC Power Analysis
 Power Critical Signal Flow in RTL/Gate power analysis
 Summary
 The design size of mobile SoC has been increasing at a rapid speed
 Fierce competition in mobile market has driven SoC design to provide high
performance and numerous functionality, which was only previously available in PC
and laptop
 To meet the power wall of mobile design and leverage the additional capacity in
silicon processing scaling, multiple cores and parallelism are popular in current SoC
design
SoC consumer portable design complexity trends
- ITRS, 2011 edition
 The era of billion gate SoC design put significant challenges for power analysis
 Simulation is needed to analyze the dynamic power accurately. However, for billion
gate SoC, the simulation runtime is becoming too long for reasonable design cycle
 The simulation waveform generated from simulation can occupy more than hundreds
of GigaBytes, which puts significant burden on power analysis tools to deal with.
10’s of modes
....
Video streaming
GPS + Voice Call
Web + Email
Millions of clocks
 Basic concept of RTL power estimation
Inputs: RTL-coded design, power library, capacitance model, activity file
1. Elaborate: RTL design is compiled and elaborated into an interconnection of
primitive gates
2. Calculate Power: Design is mapped to the target technology and average/timebased power analysis is performed based on switching activity
Capacitance
model
Power Library
(.lib)
RTL
(Verilog/VHDL)
PowerArtist
Elaborate
Micro-architectural
Inferred netlist
Verilog Simulation
Activity File
(.vcd/.fsdb/.saif)
Calculate Power
RTL power report
To obtain reasonable accuracy, simulation is needed for vector-based power estimation
 Generate a significantly smaller power-critical-signals-only FSDB
from the Emulator/Simulator
RTL
Test Bench
RTL
PowerArtist
Power-Critical Signal
Extraction
Verilog
Simulation
Critical
Signal List
Test Bench
Verilog
Simulation
Full FSDB
Partial FSDB
initial befin
$fsdbDumpfile(“pa_extracted.fsdb”);
$fsdbDumpvarsByFile(“sig_file_name”);
end
testbench.top_inst.temp_out
testbench.top_inst.temp
testbench.top_inst.en
testbench.top_inst.out
testbench.top_inst.clk
testbench.top_inst.inC
testbench.top_inst.inB
testbench.top_inst.inA
Optimized for power analysis over
entire simulation duration
Apache PowerArtist
Identify Power-Critical Signals
Power Analysis + Debug
L1
Simulator/Emulator
Reduced
FSDB
Optimized for functional debug
over limited clock cycles
Functional Debug Tools
Identify Function-Critical Signals
Functional Debug
L2
Reduced
FSDB
Simulator/Emulator
 Power-critical signals
 Activity for only a subset of signals is necessary for accurate power estimation
 Critical signals consists of signals such as sequential and module in/out ports
 Non-critical signals
 Activity propagation can be performed for the remaining signals based-on activity
propagation formulae of various cell types
PI & PO
IO cells
Latches
ICGCs
Flip-Flops
MUX
 Application
 Power-critical signals can be extracted for both RTL and gate-level designs
 Critical signals can be utilized in simulation as well as emulation flows
 Impact
 Activity file dumped only for power-critical signals saves simulator/emulator and power
analysis runtime and memory resource with small error in power analysis
 Power-critical signal flow enables power analysis of huge design for which power estimation
used to be unrealizable
Wire Load
Model
Power Library
RT/Gate-level
design
Elaborate
Crit. Sig. Extraction
PowerArtist
Micro-architecturally
Inferred netlist
Crit. sig. list
Simulation/Emulation
Partially dumped
Activity File
Calculate Power
RTL Power
Report
Test Bench
 Experimental result with Design-A in RTL
 The first experiment was done with a multimedia codec IP design
 Design size is about 8 Million Gates, with 32nm library
CPU time
Impact on CPU time
Impact on memory resource & power result
 Experimental result with Design-B in RTL
 The second experiment was done with quad-core CPU block
 Design size is Tens of Million Gates, with 32nm library
Impact on memory resource & power result
CPU time [hr]
Impact on CPU time
117
12
24
14
 Experimental result with Design-A in Gate-level
 The third experiment was done with same design as the first one but in gate-level
 Design size is about 8 Million Gates, with 32nm library
CPU time
Impact on CPU time
Impact on memory resource & power result
 In the era of billion gate SoC chip design, the runtime and generated waveform
database size are challenging issues for accurate power estimation.
 To solve this challenge, we have proposed to use a subset of the full signal list in the
design when dumping the waveform. We have introduced the methodology on how to
choose this signal subset for good power correlation while keep this signal subset small
enough.
 The PowerArtist power critical signal flow has been verified by extensive experiments
covering both RTL and gate-level power estimation flows.
 Our experimental results show that critical signal flow cut the runtime by 70-80%,
simulation waveform size by 60-97%, while keeping the power correlation within less
10% mismatch.
Download