Microelectronics Journal 146 (2024) 106143
Contents lists available at ScienceDirect
Microelectronics Journal
journal homepage: www.elsevier.com/locate/mejo
An efficient algorithm for estimating gate-level power consumption in
large-scale integrated circuits
Zejia Lyu, Jizhong Shen ∗
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
ARTICLE
INFO
Keywords:
Large-scale integrated circuits
Power estimation
Complex topology
Strong–weak loop processing
ABSTRACT
Estimating power dissipation in Very Large Scale Integrated (VLSI) circuits, particularly large-scale sequential
circuits, is a significant challenge in Electronic Design Automation (EDA). Benchmarked against PrimeTime PX,
the proposed algorithm proficiently analyzes large-scale combinational and sequential circuits. This research
begins with a power analysis algorithm for combinational circuits, focusing on signal probability (ππ ) and
transition count (π πΆ). It then extends to sequential circuits, introducing methodologies for loopback issues
and a validity-checking mechanism for spatial dependency topology. Developed on the OpenTimer open-source
framework using the SMIC 40-nm process library, the algorithm was validated on academic and industrial
datasets. Experimental evaluations show that our algorithm outperforms in terms of speed and accuracy. For
power analysis with vector annotations, it achieves an average error within 0.16% in power testing of largescale timing circuits. It also performs gate-level vectorless power analysis on complex topology and large-scale
integrated circuits, demonstrating computational robustness.
1. Introduction
In the realm of integrated circuits and semiconductors, the continuous advancements have facilitated their widespread applications in
various sectors, including manufacturing, agriculture, and daily life.
Propelled by Moore’s Law, the scale and complexity of digital integrated circuits have exponentially escalated, positioning power consumption as a pivotal factor constraining chip design and performance.
In the chip design phase, accurately estimating the power performance
of the chip is crucial as it provides essential references for optimizing
the design [1].
Within the chip manufacturing arena, designers predominantly utilize automated EDA tools for estimating circuit power consumption,
enabling comprehensive analysis of power overhead within circuits.
In contemporary industrial EDA processes, the average power consumption of a circuit is a pivotal metric for assessing its power characteristics. This metric is primarily obtained by administering random
vector tests on combinational and sequential circuits. In this study, we
adopt the average power as a representative measure of the circuit’s
power usage. Several factors influence a circuit’s power consumption,
including activities, parasitic parameters, operating voltage, and timing
parameters. Notably, the circuit’s activity, which reflects its operational
status, directly and significantly influences its power consumption. This
activity is intimately associated with the random vectors applied during
testing. The algorithm’s aim presented in this paper is aligned with
the average power analysis performed by commercial power analysis
software, which computes the circuit’s power based on its activities.
Other factors, such as parasitic parameters, operating voltage, and
timing, are pre-determined utilizing standard EDA tools.
Noteworthy industry-standard EDA power analysis tools encompass
Cadence’s Voltus and Synopsys’s PrimeTime PX (PTPX). Although these
commercial EDA software offerings are comprehensive, their algorithms for estimating power in large-scale circuits often manifest as
performance bottlenecks. Some open-source third-party power analysis
software such as OpenSTA [2] exists, however, they are confined to
analyzing specific power scenarios, offering limited functionality.
Currently employed commercial EDA software leverages activity
files to perform effective power estimation. Derived from circuit input
data, these activity files are garnered post-simulation with software
like VCS or ModelSim. This necessitates extensive simulation cycles for
large-scale circuits, along with meticulously crafted input waveforms.
Moreover, the voluminous size of activity files for expansive integrated
circuits inflicts significant computational resource constraints, exacerbating chip design overhead [3,4]. The paradigm of vectorless power
estimation employs probabilistic methods to directly deduce circuit
activity, characterized by the transition state of interconnections within
the circuit. This approach obviates the requirement for input waveform selection and the utilization of activity files, markedly bolstering
estimation efficiency [5].
∗ Corresponding author.
E-mail addresses: zejialv@zju.edu.cn (Z. Lyu), jzshen@zju.edu.cn (J. Shen).
https://doi.org/10.1016/j.mejo.2024.106143
Received 20 November 2023; Received in revised form 25 January 2024; Accepted 23 February 2024
Available online 28 February 2024
1879-2391/© 2024 Elsevier Ltd. All rights reserved.
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
2. Estimation of average power
The study of vectorless power estimation algorithms has garnered
significant attention in the academic domain. These algorithms aim to
estimate the average circuit activity, initially anchored in probabilistic
analysis. Before the stages of layout and routing, these algorithms ascertain the transition probability within combinational circuits, which
culminates in power consumption estimation. Notably, Najm et al. [6]
advanced the notion of transmission density for probability distribution. Similarly, Monteiro et al. [7] put forth symbolic reasoning, ,Marculescu et al. [8] probabilistically modeled the transition probability,
Wu et al. [9] employed a multi-valued logic approach for behavior probability inference, while Czajkowski et al. [10,11] advocated
decomposition for expedited vectorless power estimation. SAT-based
methods [12–14] have also shown their potential power application
abilities. Diverse methodologies, grounded in statistical probabilities,
have been highlighted in other scholarly works [15–18]. Some of these
expanded their scope to incorporate circuit characteristics, accounting
for delay effects [19–21], and the repercussions of statistical timing [22]. A caveat, however, is that these methodologies predominantly
address combinational circuits and are constrained in scalability, proving inadequate for sequential circuits with intricate loops. An extension
of the literature [23] encompassed large-scale circuits but remained
within the ambit of combinational circuits.
Few literary works [24,25] have taken sequential circuits into account, but their applicability is constrained to simple or smaller-scale
circuits, lacking experiments on large-scale circuits. Such algorithmic
models, based on traditional probabilistic frameworks, incur significant
computational overheads, thus limiting their precision and industrial
applicability, rendering them unsuitable for large-scale circuits.
Subsequent research has proposed innovative methods for optimizing algorithmic computational efficiency and accuracy, such as the
Bayesian network method by Bhanja et al. [26,27]. Concurrently, the
advent of machine learning has boosted data-driven studies in the
power estimation arena [28–33]. However, the effectiveness of these
learning-based methods is often compromised by their sensitivity to
specific circuit structures, resulting in limited generalizability. Furthermore, these methods predominantly depend on circuit simulators that
use pre-defined vectors to emulate circuit activities. Such approaches
are not ideally suited for analyzing average power consumption in
industrial applications, where the complexity and variability of circuit
designs demand more adaptable and generalized methodologies.
In response to these aforementioned research and industrial challenges, this paper unfurls a novel vectorless power estimation method
adept for complex topology circuits with strong–weak loop processing.
The proposed algorithm is hallmarked by the following innovative
features:
The power consumption estimation in CMOS circuits is generally
bifurcated into transistor-level and gate-level analyses, corresponding
to different hierarchies within the circuit. Transistor-level analysis,
though accurate, is time-consuming, rendering gate-level analysis a
more practical alternative for assessing the power consumption of
large-scale integrated circuits. Within the realm of integrated circuits,
gate-level power analysis predominantly refers to the evaluation of
average power. In the semiconductor industry, power consumption
calculations for gate-level large-scale digital integrated circuits leverage
the data provided in the liberty files within the Process Design Kit
(PDK) supplied by foundries. Average power consumption is typically
categorized into three components: leakage power, internal power,
and switching power [1]. The overall average power of a circuit is
derived from the cumulative contribution of these three power types.
The following methods based on the power-related information in the
liberty file are used to calculate these power categories.
2.1. Leakage power
Leakage power, often synonymous with static power within the
industry, primarily represents the power dissipated by gate units when
the circuit nodes’ signal voltage remains stable. Various mechanisms
contribute to static power dissipation, with subthreshold leakage current from source to drain in the transistor being predominant. This
current results in significant leakage power. From a manufacturing
standpoint, static power also emanates from current leakage between
the diffusion and substrate layers. While negligible in older processes,
static power has become significant in contemporary, advanced processes, constituting a notable fraction of a circuit’s total power consumption. The industry standard calculation for leakage power, ππππππππ ,
encompasses the summation of each gate’s ππππ‘π πππππππ , as expressed
below:
∑
ππππ‘π πππππππ
(1)
ππππππππ =
π ππ πππβ πππ‘π
∑
ππππ‘π πππππππ =
πππππππ πππ€ππ × π π‘ππ‘π ππππππππππ‘π¦
(2)
π ππ πππβ ππππππ‘πππ
Here, the πππππππ πππ€ππ value pertains to the leakage power magnitude obtained from the library for each gate, while π π‘ππ‘π ππππππππππ‘π¦
diverges from signal probability. The π π‘ππ‘π ππππππππππ‘π¦, a composite of
signal probabilities, is exemplified for a two-input AND gate with input
ports A and B, with possible states π΄π΅, π΄¬π΅, ¬π΄π΅, and ¬π΄¬π΅. The
π π‘ππ‘π ππππππππππ‘π¦ is computable via signal probability.
1. It can process the topological structures of industrial-scale sequential circuits. The overall method consists of two parts: inference and propagation.
2. The algorithm’s inference mechanism for signal probability and
transition probability is strategically designed based on information extracted from the liberty process library files, maximizing
the utilization of process library power information.
3. Considering the complex topologies in large-scale integrated
circuits, proposed method introduces strong–weak processing
mechanisms alongside the concept of power arcs, which facilitates traversal of complex topologies and subsequent propagation of transition probability and signal probability.
4. Ensuring computational efficiency, the algorithm considers circuit spatial dependencies. The algorithm incorporates a validity
checking mechanism to diminish the impact of circuit spatial
dependencies, guaranteeing result accuracy and validity.
2.2. Internal power
The analysis of switching activity within a circuit traditionally
holds that the short-circuit current is the primary contributor to internal power consumption. This perspective defines short-circuit power
consumption as the power utilized when a gate-level component momentarily enters a conductive or short-circuited state, with both PMOS
and NMOS transistors conducting simultaneously. The simultaneous
conduction of both transistor types, a result of delay effects within the
real-world circuit dynamics, results in the short-circuit current, denoted
as πΌπ π , to flow directly from the supply voltage, ππ·π· , to the ground,
thus leading to short-circuit power consumption. This phenomenon is
illustrated in Fig. 1. In the industry, short-circuit power consumption
is often considered as internal power consumption.
Short-circuit power is related to the signal transition in the circuit.
The short-circuit power is relatively low for faster signal transitions,
whereas slower transitions will increase the short-circuit power. Shortcircuit power is also affected by the circuit’s load capacitance and
transistor characteristics.
We have seamlessly integrated our proposed algorithm within the
OpenTimer framework [34]. Experimental evaluations robustly demonstrate that our algorithm surpasses open-source framework, OpenSTA.
Remarkably, in specific scenarios, it parallels the performance of the
industry’s gold-standard software PTPX.
2
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Here, π π
π denotes the toggle rate for each nodes, πΆ represents the
parasitic parameter, ππ·π· is the supply voltage. Similarly, the stronger
the logic activity of the circuit, the more intense the charging and discharging of the output load capacitance of the gate-level components,
leading to increased switching power. In several prevalent processes,
switching power contributes roughly 70%–90% of the entire circuit
power. The cumulative switching power ππ π€ππ‘πβπππ of the circuit can be
expressed as:
∑
ππ π€ππ‘πβπππ =
ππππ‘ π π€ππ‘πβπππ
(9)
π ππ πππβ πππ‘
2.4. Power calculation space
Fig. 1. Schematic diagram of short-circuit power consumption, the relevant parameters
are: ππ·π· (supply voltage), πΆπΏ (load capacitance), πππ’π‘ (output voltage), πΌπ π (short-circuit
current), and πππ (input voltage).
For a given circuit design, technology, and parasitic parameters,
the approach to power calculation necessitates computing the specific
conditions and transition scenarios under which the three power types
manifest, thereby deriving the average power. For leakage power, the
power calculation space pertains to the varying conditions causing
leakage power, contingent upon signal probability. For internal power,
it relates to the conditions inducing internal power at the input and
output terminals, influenced by both signal and transition probabilities.
Concerning switching power, the power calculation space revolves
around the transition probability for each interconnection.
More precisely, Signal Probability is defined as the fraction of the
duration wherein the signal remains high relative to the entire test
duration:
ππππππ βππβ πππ£ππ ππ’πππ‘πππ
(10)
ππ =
π ππ‘ππ π ππ π‘ ππ’πππ‘πππ
Transition probability is delineated as the quotient of signal toggle
occurrences over the total test duration, which is the same as toggle
rate in the article:
ππππππ π‘πππππ πππ’ππ‘π
ππ =
(11)
π ππ‘ππ π ππ π‘ ππ’πππ‘πππ
Transition count refers to the cumulative number of signal toggles
throughout the total test duration:
The total internal power consumption of the circuit, ππππ‘πππππ , is
computed by aggregating the internal power consumption across all
pins, expressed as:
∑
ππππ‘πππππ =
ππππ πππ‘πππππ
(3)
π ππ πππβ πππ
For each pin, the internal power consumption, ππππ πππ‘πππππ , is determined by summing the internal power consumption for each respective
condition:
∑
ππππ πππ‘πππππ =
πππ‘πππππ ππππππ¦ × π π
πππ
(4)
π ππ πππβ ππππππ‘πππ
In this context, both the πππ‘πππππ ππππππ¦ and conditions for internal
power consumption are sourced from a previously referenced liberty
file. The π π
πππ , pin toggle rate, obtained from the activity file (i.e., VCD,
SAIF) or by inference.
The liberty file houses the πππ‘πππππ ππππππ¦ within a Lookup Table
(LUT), wherein the input pin is represented as a one-dimensional
LUT. A one-dimensional interpolation algorithm is employed, wherein
a known variable π¦, and known values π₯1 < π₯ < π₯2 exist, along
with corresponding pairs (π₯1 , π¦1 ) and (π₯2 , π¦2 ). The interpolation result
is expressed as:
π₯ π¦ − π₯2 π¦1
π¦ − π¦2
π¦= 1
π₯+ 1 2
(5)
π₯1 − π₯2
π₯1 − π₯2
π πΆ = ππππππ π‘πππππ πππ’ππ‘π
Fundamentally, the essence of average power calculation hinges on
accurately estimating both signal and transition probabilities.
3. Signal probability and transition probability estimation algorithm
A two-dimensional LUT is utilized for the output pin, and the power
calculation employs an interpolation algorithm.
Let the variable to be determined be π§, with known values π₯1 <
π₯ < π₯2 , π¦1 < π¦ < π¦2 , and existing coordinates (π₯1 , π¦1 , π§11 ), (π₯1 , π¦2 , π§12 ),
(π₯2 , π¦1 , π§21 ), and (π₯2 , π¦2 , π§22 ). The interpolation result is then defined as:
The vectorless power calculation method performs inference on
the power calculation space to complete power consumption estimation. Inference on power computation space is essentially inferring the
transition probability and signal probability, and for convenience, we
choose to estimate the π πΆ over a unified duration to represent the
π π . Existing research on average power mostly focuses on inferring
π π , considering spatial and temporal correlations of signals. These
inference algorithms are generally verified on small-scale circuits, lacking corresponding power computation methods for circuits exceeding
10π gate units. Moreover, the computational efficiency of existing
academic algorithms is low, and the time overhead for large-scale and
very large-scale circuits is unacceptable for industrial deployment. As
technology continuously advances, the existing process library power
part is recorded in the form of conditional power, but current academic
algorithms do not support conditional power. Existing academic inference algorithms can only infer ordinary pure combinational circuits or
sequential circuits that comply with Directed Acyclic Graphs (DAG),
but traditional inference algorithms no longer meet the requirements
under the current large-scale circuit environment, where the circuits
are multi-clock Directed Cyclic Graphs (DCG) sequential circuits.
The work proposes a π πΆ and ππ based algorithm based on industrial process library power information. This algorithm can handle
(6)
π§ = π΄ + π΅π₯ + πΆπ¦ + π·π₯π¦
Where the coefficients π΄, π΅, πΆ, and π· satisfy subsequent conditions:
β‘ π§11 β€ β‘ 1
β’
β₯ β’
β’ π§12 β₯ = β’ 1
β’ π§21 β₯ β’ 1
β’ π§ β₯ β’ 1
β£ 22 β¦ β£
π₯1
π₯2
π₯1
π₯2
π¦1
π¦1
π¦2
π¦2
π₯1 π¦1 β€ β‘ π΄ β€
β₯ β’
β₯
π₯2 π¦1 β₯ β’ π΅ β₯
×
π₯1 π¦2 β₯ β’ πΆ β₯
π₯2 π¦2 β₯β¦ β’β£ π· β₯β¦
(7)
2.3. Switching power
Switching power, often termed capacitive power, arises from signal
transitions at the output terminals of gate-level components, resulting
in the charging and discharging of load capacitances. The load capacitance encompasses all interconnected lines and the capacitances of
gate-level components. The switching power associated with each net,
ππππ‘ π π€ππ‘πβπππ , can be articulated as:
2
ππππ‘ π π€ππ‘πβπππ = 0.5πΆππ·π·
π π
π
(12)
(8)
3
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Fig. 2. Schematic diagram of power arc, where the CELL type power arc is marked with green dashed lines, and the NET type is marked with blue dashed lines.
Table 1
Relationship between logical operation and signal probability, where π΄,
π΅ are input ports, and π is the output port.
DCG circuits and is characterized by fastness, efficiency, and high
robustness, holding good industrial application value. The algorithm is
divided into two parts: inference and propagation. Inference refers to
inferring the output pin’s π πΆ, ππ from the input pin information based
on the characteristics of gate-level components. Propagation refers to
inferring the signal probability and transition count of each node in
the circuit and calculating conditional power, especially for processing
DCG circuits. The key content of the propagation mechanism is how to
break the complex loop in the circuit reasonably. Regarding the scope
of effort, the inference mechanism emphasizes the information changes
within the gate-level unit, while the propagation mechanism focuses on
the circuit topology.
Signal probability
INV (¬)
BUFF
AND (&)
OR(β₯)
πππ = 1 − πππ΄
πππ = πππ΄
πππ = πππ΄ × πππ΅
πππ = πππ΄ + πππ΅ − πππ΄ × πππ΅
and the enable port. The conditional probabilities of diverse leakage
power consumptions in the circuit are computed upon marking all
signal probabilities. For a gate-level unit with an input pin set πΌπ =
{π΄0 , π΄1 , π΄2 , … , π΄π−1 }, where π denotes the number of input pins, the
leakage power condition space dimension is 2π . The state probability
is expressed as:
3.1. Power arc
This section presents the notion of the power arc, a concept similar
to the timing arc employed in timing analysis [34]. It delineates the
power inference relationship between two pins and categorizes it into
CELL and NET types. The CELL type is derived from the tags within
the process library, which documents the direction of power occurrence
in the internal power field of the output pins. This occurrence is akin
to the internal power of the output pin initiated by a certain input
pin. Conversely, the NET type discerns the propagation across varied
interconnections, as depicted in Fig. 2.
The introduction of the concept of power arcs facilitates reasoning
about the power calculation space of circuits. It further unveils the
inference mechanism of π πΆ. Pins demarcate the commencement and
termination of the power arc. In the forthcoming algorithm description,
the initiating pin of the power arc is referred to as the π πππ pin, and
the concluding pin as the π‘π pin.
πΆπππ· = {π΄0 π΄1 π΄2 … π΄π−1 , ¬π΄0 π΄1 π΄2 … π΄π−1 … ,
¬π΄0 ¬π΄1 ¬π΄2 … ¬π΄π−1 }
(14)
Each leakage power’s conditional probability value is then determined as follows:
ξΌ = {ξΌ(π΄0 )ξΌ(π΄1 )ξΌ(π΄2 ) … ξΌ(π΄π−1 ),
ξΌ(¬π΄0 )ξΌ(π΄1 )ξΌ(π΄2 ) … ξΌ(π΄π−1 ), … ,
(15)
ξΌ(¬π΄0 )ξΌ(¬π΄1 )ξΌ(¬π΄2 ) … ξΌ(¬π΄π−1 )}
Suppose the process library records the conditional power consumption size as πππ(πΆπππ·π ). The gate-level unit’s leakage power size,
πΏππππππ π ππ€ππππππ , is then computed using the equation:
∑
πΏππππππ π ππ€ππππππ =
ξΌπ × πππ(πΆπππ·π )
(16)
π
By employing this approach for each gate-level unit, the leakage
power sizes for various gate-level components within the circuit are
calculated, and their sum gives the total circuit leakage power.
3.2. Inference mechanism
The inference mechanism bifurcates into the inference of ππ and
the inference of π πΆ.
3.2.2. Inference of transition count
The π πΆ inference utilizes a conditional inference algorithm based
on the process library, referring to the conditions under which internal
power consumption occurs. The π πΆ values of the fan-in interconnect
lines are allocated to various internal power consumption conditions,
facilitating the computation of π πΆ values under these conditions. The
calculation method is outlined as follows:
For a specific input pin π΄π , its internal power consumption condition
relies on the signal probabilities of the gate-level unit’s input pins. Upon
ensuring complete marking of each input pin’s signal probability, distribution calculations are performed to ascertain the π πΆ under various
conditions, expressed as:
3.2.1. Inference of signal probability
This subsection elaborates on the inference of signal probability.
Upon complete marking of the signal probability of the input pin, the
output pin’s signal probability is inferred. This inference is based on
the logic function recorded in the process library’s function field of the
output pin. The transformation relationship between logical operation
and signal probability, as delineated in Table 1, follows the existing
literature [24].
These are decomposed into fundamental logical operation relationships for more intricate logical operations, enhancing the computation
of signal probabilities. Concerning register cells, their functional fields,
not being directly logically inferred, are defined as:
ππ (π) = ππ (π·) × ππ (πΈπ)
Function
π πΆπ,π = π (πΆπππ·π ) × π (π π
π΄πππ ) × π πΆ(ππΈπ )
(17)
Where π denotes the pin index value, π (πΆπππ·π ) is the occurrence
probability of this condition, and π (π π
π΄πππ ) represents the current
signal transition probability with π ∈ {πΉ π΄πΏπΏ, π
πΌππΈ}. π πΆ(ππΈπ )
denotes the π πΆ value of the fan-in interconnection line.
(13)
This equation shows that the output Q terminal signal probability
equals the product of the signal probabilities at the input D terminal
4
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Given the distinct propagation mechanisms for signal probability
and transition count, the strong propagation mechanism is bifurcated
into two specific variants: signal probability strong propagation algorithm and transition count strong propagation algorithm. Similarly,
the weak propagation mechanism is also categorized into these two
types. The INFER and VIR_INFER operations should deal with combinational logic elements and registers respectively. And these inference
operations are also divided into SP_INFER, TC_INFER, SP _VIR_INFER,
TC _VIR_INFER, according to signal probability and transition count.
An introduction to the symbols used is provided to facilitate a
clearer understanding of the subsequent propagation algorithms, as
depicted in Fig. 4. The symbols are named based on the power arc.
Each arc’s π πππ and π‘π endpoints document the ππ and π πΆ values,
]
]
[
[
substantiating the belief that π ππππππ‘ πΏ and π‘ππΆπππ πΏ−1 are identical
endpoints. The figure highlights these corresponding endpoints with
]
[
uniform color, where πΏ denotes the inference level. The π‘ππππ‘ πΏ will
[
]
correlate with multiple π ππππΆπππ πΏ+1 , permitting a marginal variation
]
]
[
[
between these values. The amalgamation of π ππππππ‘ πΏ and π‘ππππ‘ πΏ is
represented as [πππ‘]πΏ .
Following above, each of these propagation algorithms will be
discussed in detail.
3.3.1. Signal probability weak propagation algorithm
This section delineates the execution of signal probability propagation calculations on the DAG structure within the circuit, as outlined in
Algorithm 1.
Fig. 3. Sequential circuit structure.
Concerning output pins, their internal power consumption is inferred from the input pins based on the power arc concept. The π πΆ
values under various power conditions are computed as:
π πΆπ,π = π (πΆπππ·π ) × π (π π
π΄πππ ) × π πΆ(π΄π
πΆπΉ πππ )
(18)
Again, π (πΆπππ·π ) is the occurrence probability of this power consumption condition, and π (π π
π΄πππ ) represents the current signal
transition probability. The π πΆ here originates from the fan-in pin of
the power arc, denoted as π πΆ(π΄π
πΆπΉ πππ ).
Consequently, employing the π πΆ under various power conditions
and the power information in the process library, the internal power
value of each pin is inferred, leading to the determination of the total
internal power of the circuit. The accumulated π πΆ values under various
power conditions of the output pins are considered as the total π πΆ of
the output pins, as well as the π πΆ value of the corresponding fan-out
interconnection line.
Algorithm 1: Signal Probability Weak Propagation Algorithm
3.3. Propagation mechanism
Data: Given Cell Arc π‘π pin of Prior level: [π‘ππΆπππ ]πΏ−1 , and the
corresponding ππ is [πππ‘ππΆπππ ]πΏ−1
Result: [ππ πππ‘ ]πΏ ,[ππππΆπππ
] ,[πππ‘ππΆπππ ]πΏ+1
πππ πΏ+1
The propagation mechanism delineates the method of traversing the
circuit, aspiring to resolve the comprehensive power computation space
in conjunction with the previously outlined inference mechanism. The
key to this mechanism is managing the sequential circuit DCG structure,
as depicted in Fig. 3, which illustrates a typical sequential circuit loop
structure.
Beyond the logical loops instigated by latches, industrial integrated
circuits encompass loop structures constituted of logical gate units. This
characteristic stymies traditional power analysis algorithms tailored for
the DAG structure, necessitating an innovative approach. The augmentation of multiple clocks in large-scale circuits instills additional spatial
dependencies between clock signals, thereby amplifying the intricacy of
the propagation mechanism.
Weak Propagation Mechanism: This mechanism is tailored for
computations in circuits with a DAG structure. It primarily focuses on
determining the signal probability and transition count for as many
directly computable elements as feasible. The mechanism is based on
the principle that gate-level elements are processed only when all their
fan-in ports are adequately marked, indicating readiness to apply the
reasoning techniques outlined in Section 3.2, which operation named
as INFER. The definitions of signal probability and transition count
in this reasoning context are distinct; the related algorithms will be
further elaborated upon later in this paper. In instances where loops
are encountered, certain gate-level cells may not reach the requisite
inference state, resulting in an automatic cessation.
Strong Propagation Mechanism: This mechanism aims to disrupt
the DCG structure. Addressing the limitations of the weak propagation mechanism in loop handling, the mechanism concentrates on
loop processing. It prioritizes gate-level cells based on their respective
states regarding signal probability or transition count inference. The
algorithm identifies critical entry points within loops and assembles
a priority queue of gate-level elements. Each element in this queue
is assigned virtual default markings, if its fan-in pin signal state is
missing. The weak propagation mechanism is subsequently applied for
refined reasoning, with these virtual markings being updated through
successive reasoning. This inference approach (outlined in Section 3.2)
and virtual marking is termed VIR_INFER within the context of this
paper.
if [ππππππ‘
] is not assigned then
πππ πΏ
[ππ πππ‘ ]πΏ ← [πππ‘ππΆπππ ]πΏ−1 ;
3 end
πΆπππ ]
4 foreach [ππ
corresponds to [ππ πππ‘ ]πΏ do
π πππ πΏ+1
πππ‘ ] ;
5
[ππππΆπππ
]
←
[ππ
πΏ
πππ πΏ+1
1
2
Label this pin as visited ;
if [π ππππΆπππ ]πΏ+1 is primary output then
8
continue;
9
end
10
if Gate corresponds to current Pin is in a ππ inferable state
then
11
SP_INFER([πππ‘ππΆπππ ]πΏ+1 ) ;
12
Label the [π‘ππΆπππ ]πΏ+1 as visited ;
13
Recursively this algorithm to [π‘ππΆπππ ]πΏ+1 ;
14
end
15 end
6
7
The essence of this algorithm is to disseminate and deduce the signal
probability within the circuit to the maximum extent feasible given the
current circuit signal probability state. This encompasses the inferable
state of gate-level units. The ππ inferable state of a gate-level unit is
defined as a state wherein the gate-level unit has not been inferred, and
all ππ of input pins of the gate-level unit have been assigned a value
during the propagation process.
3.3.2. Signal probability strong propagation algorithm
The strong signal probability propagation algorithm primarily functions to dissect the DCG-topology circuit. Initially, identifying the loop’s
entrance is paramount, which is achieved by constructing a priority
queue to discern the loop’s entrance. The ratio of visited input ports to
the total number of input ports characterizes the priority for building
the queue. The priority is defined as:
πππππππ‘π¦ =
5
π£ππ ππ‘ππ ππππ’π‘ ππ’ππππ
ππππ’π‘ π ππ§π
(19)
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Fig. 4. Symbols used in Inference Algorithm.
is defined similarly to the ππ inferable state. The depiction of the weak
transition count propagation algorithm is depicted in Algorithm 3:
Elements with a priority of less than one are pushed into the priority
queue, and the element at the head of the queue is deemed the entrance
element of the loop. The head element is dequeued as the loop entrance.
The detailed exposition of the signal probability strong propagation
algorithm is presented in Algorithm 2.
Algorithm 3: Transition Count Weak Propagation Algorithm
Data: Given Cell Arc π‘π pin of Prior level: [π‘ππΆπππ ]πΏ−1 , and the
corresponding ππ is [πππ‘ππΆπππ ]πΏ−1 , the corresponding π πΆ
is [π πΆπ‘ππΆπππ ]πΏ−1
Result: [π πΆ πππ‘ ]πΏ ,[π πΆππΆπππ
] ,[π πΆπ‘ππΆπππ ]πΏ+1
πππ πΏ+1
Algorithm 2: Signal Probability Strong Propagation Algorithm
Data: Circuit netlist
Result: The complete circuit ππ coverage and conditional
probability inference for leakage power consumption
1 foreach gate in all gates do
2
Calculate priority;
3
if priority of this gate < 1 then
4
push gate into queue;
5
end
6 end
7 while queue is not empty do
8
Gate ← pop out the head gate of queue ;
9
if Gate has not been inferred then
10
continue;
11
end
12
if the ππ s of fan-in pins are missing then
13
assign the default ππ s to these pins;
14
end
15
SP_VIR_INFER([πππ‘ππΆπππ ]) ;
16
foreach fan-out pin of the Gate do
17
Propagate this pin using Algorithm 1;
18
end
19 end
20 Calculate the conditional probability for all gates;
if [π πΆππππ‘
] is not assigned then
πππ πΏ
[π πΆ πππ‘ ]πΏ ← [π πΆπ‘ππΆπππ ]πΏ−1 ;
3 end
πΆπππ ]
4 foreach [π πΆ
corresponds to [π πΆ πππ‘ ]πΏ do
π πππ πΏ+1
πππ‘ ] ;
5
[π πΆππΆπππ
]
←
[π
πΆ
πΏ
πππ πΏ+1
1
2
6
7
8
9
10
Label the [π ππππΆπππ ]πΏ+1 as visited ;
if [π ππππΆπππ ]πΏ+1 is primary output then
continue;
end
Calculate the π πΆ under each condition based on the
[π πΆππΆπππ
]
;
πππ πΏ+1
if Gate corresponds to current Pin is in a π πΆ inferable state
then
12
foreach pin of Gate do
13
if pin is [π‘ππΆπππ ]πΏ+1 then
14
Label the [π‘ππΆπππ ]πΏ+1 and Gate as visited ;
15
TC_INFER([π πΆπ‘ππΆπππ ]πΏ+1 ) ;
16
Recursively apply this algorithm to [π‘ππΆπππ ]πΏ+1 ;
17
end
18
end
19
end
20 end
11
Strong propagation pertains to the direct virtual assignment of ππ
to the input pins by SP _VIR_INFER and the computation of the signal
probability of the output pins based on the power arc.
3.3.4. Transition count strong propagation algorithm
Analogously, the transition count strong propagation algorithm is
fundamentally similar to the signal probability strong propagation algorithm. Owing to the construction of the priority queue being dependent
on the access path and the consistent access paths of signal probability
and transition count, this consistency also furnishes potential methods for parallel computing in future research. The exposition of the
algorithm is elucidated in Algorithm 4.
3.3.3. Transition count weak propagation algorithm
The transition count weak propagation algorithm is fundamentally
similar to the weak propagation mechanism of signal probability, with
the distinction residing in the probability type. As the propagation of
π πΆ is contingent upon the conditions of the power arc, π πΆ must be
calculated concurrently during the propagation process under various
conditions of the power arc. The inference of π πΆ of different internal
conditions employs the previously described treatment of diverse pins
direction, which is input and output two direction pins. For input pins,
solely allocation is necessitated, whereas inference is executed based
on conditionality for output pins. A gate-level unit’s π πΆ inferable state
3.3.5. Validity check and default value selection
Propagation mechanisms necessitate specific foundational marks for
inference. Without test vectors, primary input ports require default
signal probability and transition count values. Similarly, during loop
processing, some ports demand such default values. The magnitudes
6
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
• The π πΆ value associated with an output pin should align with
the π πΆ value of its consequent fan-out interconnection. In instances of discrepancies, the π πΆ values, under diverse power arc
scenarios of the output pin, are adjusted to ensure their aggregate
corresponds to the interconnection.
Algorithm 4: Transition Count Strong Propagation Algorithm
Data: Circuit netlist
Result: The complete circuit π πΆ coverage and conditional
probability inference for internal power consumption
1 foreach gate in all gates do
2
Calculate priority;
3
if priority of this gate < 1 then
4
push gate into queue;
5
end
6 end
7 while queue is not empty do
8
Gate ← pop out the head gate of queue ;
9
if Gate has not been inferred then
10
continue;
11
end
12
Label the Gate as inferred ;
13
foreach pin in Gate do
14
Inference the internal power conditional probability ;
15
end
16
foreach π‘ππΆπππ of Gate do
17
Label the π‘ππΆπππ as visited ;
18
TC_VIR_INFER(π πΆπ‘ππΆπππ );
19
Propagate this pin using Algorithm 3;
20
end
21 end
For threshold determination related to transition count: in combinational circuits, the threshold mirrors the simulation cycle value (with
a default signal step of 1 ns); in sequential circuits, the threshold is
defined as π πΆπππ−πππππ β2.
The criteria for verifying the authenticity of the signal probability
stipulate if the ππ deviates from the permissible probability spectrum
(0-1), it is reverted to its default setting.
3.4. Overview of the algorithm flow
The algorithm’s overall procedure is depicted in Algorithm 5. Initially, the algorithm computes the ππ and π πΆ for each node within
the circuit. This process begins with initializing the ππ and π πΆ for
the primary inputs, followed by constructing a candidate queue. This
queue is assembled using the Build _Candidates_Queue operation, which
prioritizes the primary inputs and places the clock input port at the
queue’s end. Upon establishing this candidate queue, each port within
it undergoes a two-phase propagation process. The first phase, referred
to as weak propagation mechanism, aims to encompass as many circuit
states as possible. Following this, a strong propagation mechanism is
initiated to address complex loops within the circuit. However, the
strong propagation mechanism becomes redundant if the circuit lacks
loops. Once ππ and π πΆ are thoroughly covered across the circuit, the
computation of leakage, internal, and switching power at each node
is undertaken, as outlined in Section 2. This computation incorporates
circuit parasitic parameters and timing information. The aggregate of
these three power types at each node yields the circuit’s average power
consumption.
The combined approach of the Build _Candidates_Queue and the
propagation mechanisms is simple but effective in managing circuits
with multiple clocks. The weak propagation halts upon encountering
a loop, while the clock signal’s placement at the queue’s end by the
Build _Candidates_Queue facilitates extensive coverage of circuit states
during the initial weak propagation mechanism. Subsequently, in the
strong propagation mechanism, this comprehensive coverage assists
in prioritizing critical nodes along the clock path within the priority
queue, delineating the critical nodes within the clock tree as much as
possible.
of these default values significantly influence the precision of the
final power estimation. In contemporary industrial software, users can
customize these default values. The research introduces a default value
determination scheme:
• For signal probability, regardless of being a combinational or
sequential circuit, the default value is set at 0.5, reflecting a 50%
duty cycle.
• For transition count:
– In combinational circuits, the transition count is solely contingent on the simulation duration. With a preset simulation
duration (ππ’πππ‘πππ) of 10π ns, the default transition count is
computed as 5% × ππ’πππ‘πππ × 2.
– In sequential circuits, the transition count is influenced
by both the simulation duration and the clock cycle. The
simulation duration remains at 10π ns, while the clock cycle
is either sourced from the constraint file or input manually.
Given a clock cycle of ππππππ, the transition count for the
clock signal is described by:
π πΆπππππ =
ππ’πππ‘πππ
×2
ππππππ
Algorithm 5: Overall Algorithm Flow
Data: Circuit netlist
Result: Average power analysis result
1 begin // Propagating ππ and π πΆ across circuit
(20)
nodes
For other non-clock signals, the transition count is:
ππ’πππ‘πππ
π πΆπππ−πππππ =
× 2 × 5%
ππππππ
Initial the ππ and π πΆ to primary inputs;
ξ― ← Build_Candidates_Queue(πΆπππππ ) ;
4
foreach π in ξ― do // Initial Propagation of ππ
5
using Algorithm 1;
6
end
7
Propagating ππ for loops using Algorithm 2 ;
8
foreach π in ξ― do // Initial Propagation of π πΆ
9
using Algorithm 3 ;
10
end
11
Propagating π πΆ for loops using Algorithm 4 ;
12 end
13 begin
14
Use the Section 2 method to estimate the average power.
15 end
2
(21)
3
A spatial correlation exists among different nodes during the inference processes for ππ and π πΆ. Such correlation can yield implausible
values. Addressing spatial correlation in conventional algorithms is
intricate, sluggish, and resource-consuming. Furthermore, the labeled
transition counts might also be unrealistic, potentially due to simulation granularity. Therefore, scrutinizing the legitimacy of ππ and π πΆ
values becomes imperative.
The criteria for verifying the legitimacy of the transition count
encompass:
• Should the π πΆ value surpass a specified threshold, it is scaled
down to a quarter of its original magnitude.
7
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Table 2
Combination circuit signal probability test results
Case (size)
PTPX
OpenSTA
Proposed
PTPX/std
OpenSTA/std
Proposed/std
c17(13)
c2670(1018)
c1908(540)
tlu_part_0(186639)
fgu_part_3(40980)
mcu_fbdic_ct1_part_0(19920)
fgu_part_4(255903)
mmu_part_0(137822)
ccx_part_0(159199)
ccx_part_1(20866)
0.10%
6.34%
1.86%
1.54%
3.34%
2.25%
0.38%
0.81%
1.08%
0.15%
1.47%
4.94%
5.78%
2.60%
4.97%
7.93%
10.76%
2.12%
3.50%
0.05%
1.48%
4.82%
5.59%
2.09%
4.98%
2.63%
0.11%
1.30%
3.47%
0.03%
0.0016
0.1146
0.0392
0.0600
0.0505
0.0716
0.0334
0.0362
0.0333
0.0017
0.0279
0.0884
0.0840
0.0978
0.0612
0.1706
0.1592
0.0855
0.0639
0.0006
0.0279
0.0776
0.0806
0.0769
0.0612
0.0744
0.0103
0.0564
0.0637
0.0004
Mean
1.79%
4.41%
2.65%
0.0442
0.0839
0.0529
Table 3
Combination circuit transition probability test results.
Case (size)
PTPX
OpenSTA
Proposed
PTPX/std
OpenSTA/std
Proposed/std
c17(13)
c2670(1018)
c1908(540)
tlu_part_0(186639)
fgu_part_3(40980)
mcu_fbdic_ctl_part_0(19920)
fgu_part_4(255903)
mmu_part_0(137822)
ccx_part_0(159199)
ccx_part_1(20866)
5.72%
10.65%
21.96%
12.94%
26.30%
20.17%
21.10%
13.83%
13.86%
7.80%
11.49%
24.21%
41.41%
26.20%
46.89%
37.76%
43.45%
25.15%
27.75%
15.80%
11.49%
21.06%
33.47%
23.72%
34.87%
30.67%
35.66%
24.14%
27.56%
15.80%
0.0658
0.1386
0.1800
0.1603
0.1517
0.2663
0.2312
0.1954
0.1157
0.1167
0.1336
0.2503
0.3121
0.2827
0.2503
0.3405
0.3999
0.2857
0.2262
0.2020
0.1336
0.2136
0.2804
0.2645
0.1931
0.3072
0.3682
0.2772
0.2252
0.2020
Mean
15.43%
30.01%
25.84%
0.1622
0.2683
0.2465
4. Experimental results
and the proposed algorithm, the same ππ and π πΆ as random vectors
used in the golden result are set for the primary input pins to perform
vectorless estimation and resolve each node’s ππ and π πΆ of the circuit
netlist. Errors in ππ and π πΆ of each node with PTPX, OpenSTA, and the
proposed algorithm from the golden result are computed using equation
(22), utilizing the average and variance of the ππ and π πΆ errors of all
nodes in the sample as the test result of the sample. π πΆ is converted to
transition probability for unified comparison.
Considering the limited scale of the academic dataset, the effectiveness of the proposed vectorless algorithm is further validated on
large-scale combination circuits by converting the synthesized gatelevel timing circuit of OpenSPARC into a combination circuit to obtain
a large-scale OpenSPARC combination circuit. The test results with
PTPX, OpenSTA and proposed are presented in Table 2 and Table 3.
A review of the experimental results in Table 2 and Table 3 confirms
that for large-scale combinational circuits, the proposed algorithm
exhibits superior mean and variance errors for signal probability test
results compared to OpenSTA and marginally underperforms compared
to PTPX. Similar trends are observed for transition probability test
results. The analysis results from Table 2 and Table 3 reveal that the
proposed algorithm outperforms OpenSTA’s power analysis results for
signal probability and transition probability, albeit with a marginal gap
with PTPX.
However, the error in different methods for π π is greater than that
in different methods for ππ . A possible explanation is that the essence
of π π lies in the signal transition changes, while SP pertains to the
duration of high-level signal states. Therefore, π π can be considered
as a discrete differential approximation of ππ . This kind of discrete
differential-like transformation also implies a requirement for increased
computational efforts, leading to more errors. Eliminating the π π error
is still a challenge currently faced in probabilistic-based average power
analysis. How to properly and effectively solve the spatial correlation
between nodes and correctly calculate the probability results from input
to output of gate-level units are effective ways to reduce π π errors.
Regarding the relationship between error and case size, it is observed from the table that for more significant cases, the error in ππ
and π π generally increases, but the increase is not strongly related.
Some small cases still exhibit significant errors, which may be related
Experiments are conducted on the open-source RTL code of
OpenSPARC [35] and the academic ISCAS’85 [36] dataset, utilizing
the SMIC 40 nm PDK’s Liberty files, ranging from a few to hundreds
of thousands of gates, referring to the count of standard logic units in
the process library, which are inherently more complex than commonly
employed simple logic units.
The experimental procedure is automated using CTEST combined
with Python scripts. The method of calculating the difference in power
calculation results is:
|πΆπππππππ‘π − πΊπππππ|
πΈππππ =
× 100%
(22)
max (πΊπππππ, πΆπππππππ‘π)
In this context, πΆπππππππ‘π refers to the results of the algorithm being
tested, specifically representing power or probability in this paper.
πΊπππππ denotes the accurate results of power or probability testing. The
industrial comparison software employed is PTPX, supplemented with
OpenSTA, serving as the academic power analysis algorithm baseline.
We implemented the proposed algorithm using OpenTimer, developed in C++17. The execution was carried out in an environment
running Ubuntu 20.04, powered by an Intel i7-6950X CPU clocked at
3.00 GHz. Concurrently, OpenSTA was executed in the same environment. However, due to incompatibility issues with C++17, PTPX was
executed in a distinct environment running RedHat 7, equipped with an
Intel Xeon CPU E5-2695 v3 clocked at 2.30 GHz. All power analyses
were performed on a single-core CPU operating in a single-threaded
mode.
4.1. Combinational circuit vectorless signal probability and transition count
test
Initial testing evaluates the algorithm’s estimation capability for
ππ and π πΆ on the ISCAS’85 dataset, employing OpenSTA and PTPX
for comparison. The baseline dataset allocates ππ and a specific test
duration’s π πΆ to the original input ports of the circuit.
Random test vectors are calculated and simulated in VCS to obtain
the vector file SAIF, using the recorded ππ and π πΆ for each node
in the circuit netlist as the golden result. Employing PTPX, OpenSTA,
8
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Table 4
Combinational circuit power analysis test results.
PTPX
Case (size)
c17(13)
c2670a(957)
c1908(540)
tlu_part_0(186639)
mcu_fbdic_ctl_part_0(19920)
fgu_part_4(255903)
mmu_part_0(137822)
ccx_part_0(159199)
ccx_part_1(20866)
Mean
Proposed
Power
Annotated
Vectorless
err1
Annotated
err2
Vectorless
err3
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
Total
Leakage
Internal
Switching
8.03E − 06
1.01E − 09
5.26E − 06
2.78E − 06
6.07E − 04
6.95E − 08
3.40E − 04
2.66E − 04
3.04E − 04
3.71E − 08
1.58E − 04
1.46E − 04
9.03E − 02
1.35E − 05
5.41E − 02
3.62E − 02
9.36E − 03
1.43E − 06
5.25E − 03
4.11E − 03
8.29E − 02
1.94E − 05
4.91E − 02
3.38E − 02
6.61E − 02
9.91E − 06
3.99E − 02
2.61E − 02
8.34E − 02
1.12E − 05
4.85E − 02
3.50E − 02
1.08E − 02
1.52E − 06
7.07E − 03
3.74E − 03
4.53E − 02
7.46E − 06
2.68E − 02
1.85E − 02
8.68E − 06
1.01E − 09
5.69E − 06
2.98E − 06
6.99E − 04
6.93E − 08
3.96E − 04
3.02E − 04
3.86E − 04
3.70E − 08
2.04E − 04
1.82E − 04
1.00E − 01
1.35E − 05
6.03E − 02
4.00E − 02
1.04E − 02
1.42E − 06
5.81E − 03
4.55E − 03
8.40E − 02
1.94E − 05
4.96E − 02
3.43E − 02
7.27E − 02
9.91E − 06
4.41E − 02
2.86E − 02
9.69E − 02
1.12E − 05
5.71E − 02
3.98E − 02
1.14E − 02
1.52E − 06
7.43E − 03
4.00E − 03
4.97E − 02
7.46E − 06
2.96E − 02
2.02E − 02
4.01%
0.00%
8.17%
7.19%
15.16%
0.29%
16.47%
13.53%
26.97%
0.27%
29.11%
24.66%
10.74%
0.00%
11.46%
10.50%
11.11%
0.70%
10.67%
10.71%
1.33%
0.00%
1.02%
1.48%
9.98%
0.00%
10.53%
9.58%
16.19%
0.00%
17.73%
13.71%
5.56%
0.00%
5.09%
6.95%
10.72%
0.09%
11.56%
10.00%
8.03E − 06
1.01E − 09
5.26E − 06
2.78E − 06
6.07E − 04
6.95E − 08
3.40E − 04
2.66E − 04
3.04E − 04
3.71E − 08
1.58E − 04
1.46E − 04
9.03E − 02
1.35E − 05
5.41E − 02
3.62E − 02
9.36E − 03
1.43E − 06
5.25E − 03
4.11E − 03
8.29E − 02
1.94E − 05
4.90E − 02
3.39E − 02
6.61E − 02
9.91E − 06
3.99E − 02
2.61E − 02
8.34E − 02
1.12E − 05
4.85E − 02
3.50E − 02
1.08E − 02
1.52E − 06
7.07E − 03
3.74E − 03
4.53E − 02
7.46E − 06
2.68E − 02
1.85E − 02
0.05%
0.47%
0.08%
0.11%
0.08%
0.01%
0.10%
0.04%
0.03%
0.06%
0.27%
0.20%
0.00%
0.27%
0.04%
0.10%
0.02%
0.29%
0.05%
0.08%
0.00%
0.25%
0.17%
0.18%
0.08%
0.05%
0.11%
0.01%
0.04%
0.27%
0.09%
0.10%
0.11%
0.23%
0.02%
0.03%
0.08%
0.23%
0.08%
0.07%
9.55E − 06
1.01E − 09
6.30E − 06
3.26E − 06
8.25E − 04
6.92E − 08
4.69E − 04
3.57E − 04
4.06E − 04
3.70E − 08
2.13E − 04
1.93E − 04
1.15E − 01
1.35E − 05
6.96E − 02
4.53E − 02
1.21E − 02
1.42E − 06
6.85E − 03
5.24E − 03
9.00E − 02
1.94E − 05
5.38E − 02
3.63E − 02
8.31E − 02
9.90E − 06
5.07E − 02
3.23E − 02
1.22E − 01
1.12E − 05
7.25E − 02
4.91E − 02
1.25E − 02
1.52E − 06
8.04E − 03
4.44E − 03
5.76E − 02
7.46E − 06
3.45E − 02
2.30E − 02
18.99%
0.42%
19.70%
17.19%
35.95%
0.43%
37.82%
34.04%
33.52%
0.29%
35.01%
31.89%
27.19%
0.36%
28.59%
25.06%
29.20%
0.53%
30.57%
27.41%
8.62%
0.24%
9.48%
7.30%
25.65%
0.13%
27.06%
23.93%
45.72%
0.01%
49.38%
40.19%
15.58%
0.23%
13.74%
18.70%
27.81%
0.22%
30.11%
25.19%
to the cases’ topology. Small cases might have strong internal spatial
correlations and poor independence between signals. If the algorithm
cannot effectively handle these correlations, it can lead to significant errors. The proposed algorithm, PTPX, and OpenSTA demonstrate similar
relationships between error and case size.
In summation, the proposed algorithm effectively estimates ππ and
π πΆ for large-scale combinational circuits, surpassing the performance
of the current open-source framework OpenSTA and maintaining a
reasonable error range compared to PTPX.
Based on the analytical outcomes presented in Table 4, we discern
that the algorithm introduced in this paper exhibits remarkable precision when analyzing both standard academic combinational circuit
datasets and expansive industrial combinational circuit datasets. Upon
examining the SAIF file against the benchmarked PTPX, it is evident
that discrepancies in metrics such as total power consumption, leakage
power, internal power, and switching power are minimal. Specifically,
the average deviation in total power consumption stands at 0.08%,
aligning with the industry’s gold standard criteria. Concurrently, our
proposed framework is adept at executing vectorless power analyses. The leakage power discrepancies across both datasets are nearly
identical, averaging an error of 0.22%.
Notwithstanding, there are noticeable variations in switching and
internal power measurements; however, these variations remain within
a manageable spectrum. Potential influencers for these disparities may
include aspects like signal correlation. To encapsulate, our algorithm
proficiently assesses power consumption in large-scale combinational
circuits with vectors and is equally capable of conducting vectorless
power analysis utilizing Reduce-SPEF forward simulation.
4.2. Combinational circuit power test results
Power results are tested on ISCAS’85 and OpenSPARC combinational circuits, ensuring the uniformity of parasitic parameters and timing results for diverse power analysis algorithms by reading the same
Standard Parasitic Exchange Format (SPEF) file, employing ReducedSPEF in the pre-simulation phase. Since OpenSTA does not support
Reduced-SPEF and is unable to compute combinational circuit power,
comparisons are made exclusively with PTPX. The experimental results
are documented in Table 4, where PTPX’s Annotated is the power analysis result of PTPX reading the SAIF file, often considered the industry
standard power analysis result in industrial applications. The various
errors and power analysis results are detailed, affirming the algorithm’s
competency in effectively analyzing various aspects of power, including
total power, leakage power, internal power, and switching power, with
considerable accuracy and reliability.
4.3. Large-scale timing circuit testing
The proposed algorithm’s effectiveness was further assessed on
large-scale timing circuits. Table 5 presents the power analysis results
of large-scale timing circuits, with and without test vectors, respectively. Obtaining vectors for such extensive circuits is inherently challenging, necessitating the utilization of PTPX’s simulation results as
pseudo-SAIF.
9
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Table 5
Power analysis results of large-scale timing circuits.
Annoated
Case & Size
db0
1667
dec
2483
gkt
6125
l2b
97365
ccx
130052
fgu
158936
exu
207273
lsu
433503
ifu_ftu
434889
Mean
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Proposed
PTPX
error
Vectorless
Total
Internal
Leakage
Switching
Time (s)
Total
Internal
Leakage
Switching
Time (s)
2.16E − 03
2.10E − 03
3.05%
3.06E − 03
3.02E − 03
1.47%
9.37E − 03
9.30E − 03
0.78%
1.86E − 02
1.83E − 02
1.74%
1.08E − 01
1.06E − 01
2.22%
1.48E − 01
1.48E − 01
0.35%
2.95E − 01
2.95E − 01
0.10%
7.47E − 01
7.47E − 01
0.00%
2.17E − 01
2.17E − 01
0.16%
1.72E − 01
1.72E − 01
1.10%
1.87E − 03
1.80E − 03
3.57%
2.49E − 03
2.45E − 03
1.79%
7.86E − 03
7.79E − 03
0.93%
1.39E − 02
1.36E − 02
2.06%
9.03E − 02
8.80E − 02
2.62%
1.23E − 01
1.22E − 01
0.48%
2.50E − 01
2.51E − 01
0.12%
6.31E − 01
6.31E − 01
0.00%
1.70E − 01
1.71E − 01
0.21%
1.43E − 01
1.43E − 01
1.31%
3.64E − 07
3.64E − 07
0.01%
3.35E − 07
3.35E − 07
0.01%
8.21E − 07
8.21E − 07
0.00%
1.58E − 05
1.58E − 05
0.03%
2.23E − 05
2.23E − 05
0.03%
2.34E − 05
2.35E − 05
0.00%
3.94E − 05
3.94E − 05
0.01%
9.19E − 05
9.19E − 05
0.04%
7.99E − 05
7.99E − 05
0.03%
3.05E − 05
3.05E − 05
0.02%
2.93E − 04
2.93E − 04
0.02%
5.71E − 04
5.71E − 04
0.00%
1.51E − 03
1.51E − 03
0.03%
4.72E − 03
4.72E − 03
0.01%
1.80E − 02
1.80E − 02
0.17%
2.50E − 02
2.50E − 02
0.02%
4.49E − 02
4.49E − 02
0.09%
1.16E − 01
1.16E − 01
0.08%
4.67E − 02
4.68E − 02
0.12%
2.86E − 02
2.86E − 02
0.06%
0.0
0.4
3.19E − 03
2.10E − 03
34.17%
2.18E − 03
3.02E − 03
27.81%
7.88E − 03
9.30E − 03
15.27%
4.66E − 02
1.83E − 02
60.73%
2.66E − 01
1.06E − 01
60.15%
1.96E − 01
1.48E − 01
24.49%
3.26E − 01
2.95E − 01
9.51%
8.27E − 01
7.47E − 01
9.67%
4.41E − 01
2.17E − 01
50.79%
2.35E − 01
1.72E − 01
32.51%
2.67E − 03
1.80E − 03
32.58%
1.59E − 03
2.45E − 03
35.10%
5.86E − 03
7.79E − 03
24.78%
2.81E − 02
1.36E − 02
51.60%
2.02E − 01
8.80E − 02
56.44%
1.34E − 01
1.22E − 01
8.96%
2.40E − 01
2.50E − 01
4.00%
6.57E − 01
6.31E − 01
3.96%
2.98E − 01
1.70E − 01
42.95%
1.74E − 01
1.43E − 01
28.93%
3.68E − 07
3.64E − 07
1.09%
3.35E − 07
3.35E − 07
0.00%
8.32E − 07
8.21E − 07
1.32%
1.61E − 05
1.58E − 05
1.86%
2.33E − 05
2.23E − 05
4.29%
2.45E − 05
2.35E − 05
4.08%
3.88E − 05
3.94E − 05
1.52%
9.12E − 05
9.19E − 05
0.76%
8.20E − 05
7.99E − 05
2.56%
3.08E − 05
3.05E − 05
1.94%
5.21E − 04
2.93E − 04
43.76%
5.93E − 04
5.71E − 04
3.71%
2.02E − 03
1.51E − 03
25.25%
1.85E − 02
4.72E − 03
74.49%
6.43E − 02
1.80E − 02
72.01%
6.20E − 02
2.50E − 02
59.68%
8.58E − 02
4.49E − 02
47.67%
1.71E − 01
1.16E − 01
32.16%
1.43E − 01
4.68E − 02
67.27%
6.09E − 02
2.86E − 02
47.33%
0.2
0.8
0.1
0.6
0.2
1.3
3.2
47.5
3.4
32.0
7.1
50.7
7.6
71.4
13.5
116.5
12.0
197.1
5.2
57.5
0.5
1.5
1.5
2.8
26.4
50.2
18.3
39.2
55.2
70.6
46.8
84.2
69.6
199.8
63.0
270.7
31.3
80.0
5. Conclusion
In large-scale complex timing circuits with intricate logic topology
loops, the proposed algorithm outperforms the industry-standard PTPX
regarding time efficiency. The algorithm’s error compared to PTPX
is minimal for vector results, with an average error of 1.1%, again
meeting the industry gold standard level. In vectorless power analysis,
the leakage power analysis results exhibit a minor difference, below
5%, with an average error of 0.22%, between the two algorithms.
Despite specific differences in internal and switching power, the proposed algorithm exhibits additional margin, particularly beneficial in
the early stages of circuit design, offering valuable insights for power
analysis optimization and further low-power design opportunities.
This paper introduces a novel gate-level power estimation algorithm optimized for both combinational circuits adhering to the DAG
structure and expansive sequential circuits with intricate logical loops,
proficient in managing complex logic loops, and delivering fast, highprecision, and robust results. Surpassing state-of-the-art open-source
algorithms, OpenSTA, the proposed method ensures rapid and efficient
inference on large-scale combinational circuits. Compared with the
prevalent PTPX power estimation algorithm, the proposed algorithm
consistently meets the industry gold standard for annotated power analysis on large-scale circuits, with an average error of 1.1%. For vectorless
power analysis, it effectively and expediently computes power analysis
results with an additional margin, which is especially beneficial for
large-scale timing circuits and combinational circuits, thus addressing
the limitations of traditional academic power analysis algorithms.
The algorithm still presents opportunities for further optimization,
such as independence from inference in ππ and π πΆ propagation paths
and maintaining consistency in the traversal paths. This consistency in
access paths further opens avenues for subsequent parallel computing
advancements.
Regarding operational efficiency, it should be noted that the computational hardware of the two systems under comparison exhibits certain
discrepancies. Consequently, a normalized framework is essential for
a fair operational efficiency assessment. Both systems were evaluated
based on their single-thread processing capabilities, employing the
PassMark benchmark software [37]. Precisely, the CPU single-thread
performance was gauged by administering 100 iterations of the PassMark CPU Single Thread test on each system, with the mean value
representing the performance metric. The outcome of these tests revealed that the PTPX operating platform’s single-thread computational
throughput was 1592 MOps/Sec, while the platform implementing the
algorithm introduced in this study achieved 2615 MOps/Sec, equivalent to 1.71 times the throughput of the PTPX platform. Table 5
summarizes the single-thread time overhead incurred during the power
inference process, as derived from empirical testing. The average time
overhead demonstrates that, for annotated power analysis, the algorithm developed in this study achieves an operational efficiency 11.05
times greater than the PTPX platform. In the case of vectorless power
analysis, the efficiency is enhanced by a factor of 2.58 compared to
the PTPX. Both cases, exceeding the factor of 1.71, confirm that the
single-thread operational efficiency of the algorithm proposed herein
surpasses that of the PTPX single-thread performance.
CRediT authorship contribution statement
Zejia Lyu: Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Visualization. Jizhong Shen: Conceptualization, Methodology, Resources, Writing – review & editing,
Supervision, Project administration.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
10
Microelectronics Journal 146 (2024) 106143
Z. Lyu and J. Shen
Data availability
[19] C.-Y. Tsui, M. Pedram, A. Despain, Efficient estimation of dynamic power
consumption under a real delay model, in: Proceedings of 1993 International
Conference on Computer Aided Design, ICCAD, IEEE Comput. Soc. Press, Santa
Clara, CA, USA, 1993, pp. 224–228, http://dx.doi.org/10.1109/ICCAD.1993.
580061.
[20] G. Theodoridis, S. Theoharis, D. Soudris, C. Goutis, Switching activity estimation
under real-gate delay using timed boolean functions, IEE Proc. - Comput. Digit.
Tech. 147 (6) (2000) 444, http://dx.doi.org/10.1049/ip-cdt:20000891.
[21] O.S. Fadl, M.F. Abu-Elyazeed, M.B. Abdelhalim, H.H. Amer, A.H. Madian,
Accurate dynamic power estimation for CMOS combinational logic circuits with
real gate delay model, J. Adv. Res. 7 (1) (2016) 89–94, http://dx.doi.org/10.
1016/j.jare.2015.02.006.
[22] B. Liu, Signal probability based statistical timing analysis, in: 2008 Design,
Automation and Test in Europe, IEEE, Munich, Germany, 2008, pp. 562–567,
http://dx.doi.org/10.1109/DATE.2008.4484736.
[23] P. Schneider, U. Schlichtmann, B. Wurth, Fast power estimation of large circuits,
IEEE Des. Test Comput. 13 (1) (1996) 70–78, http://dx.doi.org/10.1109/54.
485785.
[24] A. Ghosh, S. Devadas, K. Keutzer, J. White, Estimation of average switching
activity in combinational and sequential circuits, in: [1992] Proceedings 29th
ACM/IEEE Design Automation Conference, 1992, pp. 253–259, http://dx.doi.org/
10.1109/DAC.1992.227826.
[25] R. Marculescu, D. Marculescu, M. Pedram, Sequence compaction for power
estimation: Theory and practice, IEEE Trans. Comput.-Aided Des. Integr. Circuits
Syst. 18 (7) (1999) 973–993, http://dx.doi.org/10.1109/43.771179.
[26] S. Bhanja, N. Ranganathan, Switching activity estimation of VLSI circuits using
Bayesian networks, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11 (4)
(2003) 558–567, http://dx.doi.org/10.1109/TVLSI.2003.816144.
[27] S. Bhanja, K. Lingasubramanian, N. Ranganathan, Estimation of switching activity in sequential circuits using dynamic Bayesian networks, in: 18th International
Conference on VLSI Design Held Jointly with 4th International Conference
on Embedded Systems Design, IEEE Computer Soc, Kolkata, India, 2005, pp.
586–591, http://dx.doi.org/10.1109/ICVD.2005.93.
[28] H. Ren, J. Hu (Eds.), Machine Learning Applications in Electronic Design
Automation, Springer International Publishing, Cham, 2022, http://dx.doi.org/
10.1007/978-3-031-13074-8.
[29] H. Dhotre, S. Eggersglüß, K. Chakrabarty, R. Drechsler, Machine learning-based
prediction of test power, in: 2019 IEEE European Test Symposium, ETS, 2019,
pp. 1–6, http://dx.doi.org/10.1109/ETS.2019.8791548.
[30] K. Roy, Neural network based macromodels for high level power estimation,
ICCIMA 2007, in: International Conference on Computational Intelligence and
Multimedia Applications, vol. 2, 2007, pp. 159–163, http://dx.doi.org/10.1109/
ICCIMA.2007.117.
[31] Y. Zhou, H. Ren, Y. Zhang, B. Keller, B. Khailany, Z. Zhang, PRIMAL: Power
Inference using machine learning, in: Proceedings of the 56th Annual Design
Automation Conference 2019, ACM, Las Vegas NV USA, 2019, pp. 1–6, http:
//dx.doi.org/10.1145/3316781.3317884.
[32] Y. Zhang, H. Ren, B. Khailany, GRANNITE: Graph neural network inference
for transferable power estimation, in: 2020 57th ACM/IEEE Design Automation
Conference, DAC, IEEE, San Francisco, CA, USA, 2020, pp. 1–6, http://dx.doi.
org/10.1109/DAC18072.2020.9218643.
[33] Y. Li, M. Liu, A. Mishchenko, C. Yu, Invited paper: Verilog-to-PyG - A framework
for graph learning and augmentation on RTL designs, in: 2023 IEEE/ACM
International Conference on Computer Aided Design, ICCAD, 2023, pp. 1–4,
http://dx.doi.org/10.1109/ICCAD57390.2023.10323741.
[34] T.-W. Huang, M.D.F. Wong, Opentimer: A high-performance timing analysis tool,
in: Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design, ICCAD ’15, IEEE Press, 2015, pp. 895–902, http://dx.doi.org/10.1109/
ICCAD.2015.7372666.
[35] I. Parulkar, A. Wood, J.C. Hoe, B. Falsafi, S.V. Adve, J. Torrellas, S. Mitra,
OpenSPARC: An open platform for hardware reliability experimentation, in:
Fourth Workshop on Silicon Errors in Logic-System Effects, SELSE, Citeseer,
2008, pp. 1–6.
[36] M. Hansen, H. Yalcin, J. Hayes, Unveiling the ISCAS-85 benchmarks: a case
study in reverse engineering, IEEE Des. Test Comput. 16 (3) (1999) 72–80,
http://dx.doi.org/10.1109/54.785838.
[37] PassMark Software, Passmark PerformanceTest Linux - Linux system benchmark
software, 2023, URL https://www.passmark.com/products/pt_linux/index.php.
(Accessed: 03 Nov 2023).
Data will be made available on request.
References
[1] S.P. Mohanty, N. Ranganathan, E. Kougianos, P. Patra, Low-Power High-Level
Synthesis for Nanoscale CMOS Circuits, Springer Science & Business Media, 2008,
http://dx.doi.org/10.1007/978-0-387-76474-0.
[2] J. Cherry, OpenSTA: Parallax static timing analyzer, 2020, URL https://github.
com/The-OpenROAD-Project/OpenSTA.
[3] D. Stamoulis, D. Marculescu, Can we guarantee performance requirements under
workload and process variations? in: Proceedings of the 2016 International
Symposium on Low Power Electronics and Design, ISLPED ’16, Association for
Computing Machinery, New York, NY, USA, 2016, pp. 308–313, http://dx.doi.
org/10.1145/2934583.2934641.
[4] L. Jin, W. Fu, M. Ling, L. Shi, A fast cross-layer dynamic power estimation
method by tracking cycle-accurate activity factors with spark streaming, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst. 30 (4) (2022) 353–364, http:
//dx.doi.org/10.1109/TVLSI.2021.3111000.
[5] S. Chandra, K. Lahiri, A. Raghunathan, S. Dey, Variation-tolerant dynamic power
management at the system-level, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
17 (9) (2009) 1220–1232, http://dx.doi.org/10.1109/TVLSI.2009.2019803.
[6] F. Najm, Transition density: A new measure of activity in digital circuits,
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 12 (2) (1993) 310–323,
http://dx.doi.org/10.1109/43.205010.
[7] J. Monteiro, S. Devadas, A. Ghosh, K. Keutzer, J. White, Estimation of average
switching activity in combinational logic circuits using symbolic simulation,
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 16 (1) (1997) 121–127,
http://dx.doi.org/10.1109/43.559336.
[8] R. Marculescu, D. Marculescu, M. Pedram, Probabilistic modeling of dependencies during switching activity analysis, IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst. 17 (2) (1998) 73–83, http://dx.doi.org/10.1109/43.681258.
[9] X. Wu, M. Pedram, Propagation algorithm of behavior probability in power estimation based on multiple-valued logic, in: Proceedings 30th IEEE International
Symposium on Multiple-Valued Logic, ISMVL 2000, IEEE Comput. Soc, Portland,
OR, USA, 2000, pp. 453–459, http://dx.doi.org/10.1109/ISMVL.2000.848657.
[10] T.S. Czajkowski, S.D. Brown, Fast toggle rate computation for FPGA circuits, in:
2008 International Conference on Field Programmable Logic and Applications,
IEEE, Heidelberg, Germany, 2008, pp. 65–70, http://dx.doi.org/10.1109/FPL.
2008.4629909.
[11] T.S. Czajkowski, S.D. Brown, Decomposition-based vectorless toggle rate computation for FPGA Circuits, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
29 (11) (2010) 1723–1735, http://dx.doi.org/10.1109/TCAD.2010.2061250.
[12] K. Hu, Z. Chu, An efficient circuit-based SAT solver and its application in logic
equivalence checking, Microelectron. J. 142 (2023) 106005, http://dx.doi.org/
10.1016/j.mejo.2023.106005.
[13] A. Sagahyroon, F.A. Aloul, Using SAT-based techniques in power estimation,
Microelectron. J. 38 (6) (2007) 706–715, http://dx.doi.org/10.1016/j.mejo.2007.
05.001.
[14] M. Li, Z. Shi, Q. Lai, S. Khan, S. Cai, Q. Xu, On EDA-driven learning for SAT
solving, in: 2023 60th ACM/IEEE Design Automation Conference, DAC, 2023,
pp. 1–6, http://dx.doi.org/10.1109/DAC56929.2023.10248001.
[15] J.-S. Wang, L.-T. Huang, H. Dong, T. Mak, Estimating power for FPGAs based
on signal probability theory, J. Electron. Sci. Technol. 10 (4) (2012) 302–308,
http://dx.doi.org/10.3969/j.issn.1674-862X.2012.04.004.
[16] P. Anju, S. Ramesh, Toggle rate estimation technique for FPGA circuits considering spatial correlation, in: 2012 Third International Conference on Computing,
Communication and Networking Technologies, ICCCNT’12, 2012, pp. 1–7, http:
//dx.doi.org/10.1109/ICCCNT.2012.6395937.
[17] Y. Nasser, J.-C. Prevotet, M. Helard, J. Lorandel, Dynamic power estimation
based on switching activity propagation, in: 2017 27th International Conference
on Field Programmable Logic and Applications, FPL, IEEE, Ghent, 2017, pp. 1–2,
http://dx.doi.org/10.23919/FPL.2017.8056783.
[18] Y. Nasser, J. Lorandel, J.-C. Prévotet, M. Hélard, RTL to transistor level power
modeling and estimation techniques for FPGA and ASIC: A survey, IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst. 40 (3) (2021) 479–493, http://dx.doi.
org/10.1109/TCAD.2020.3003276.
11
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )