Clock Net Switching Power

advertisement
Power-Aware Placement
Yongseok Cheon, Pei-Hsin Ho
Advanced Technology Group, Synopsys, Inc.
{cheon,pho}@synopsys.com
Andrew B. Kahng, Sherief Reda and Qinke Wang
UCSD CSE Department
{abk,sreda,qiwang}@cs.ucsd.edu
Outline
•
•
•
•
•
Introduction
Activity-based register clustering
Activity-based net weighting
Experiments
Conclusions
2
IC Power Consumption
• Switching power
– largest source of power dissipation
– usually accounts for 40% to 80% of total power
– switching power of a net is proportional to the product of
net capacitance and signal switching rate
• Short circuit power
– power dissipation due to short current that happens
briefly during the switching of a CMOS gate
• Leakage power
– power dissipation due to spurious currents in the
non-conducting state of a transistor
3
Clock Power Consumption
• Clock net
– a major contributor to dynamic power
– much larger capacitances than most signal nets
– highest switching activity
– typically consumes up to 40% of total dynamic
power across a variety of design types
• Traditional placement methodologies treat
registers no differently than combinational cells
– lead to sub-optimal placements in terms of power
4
Power Aware Placement Method
• Activity-based register clustering
– reduce capacitance of clock nets
hence clock power
• Activity-based net weighting
– reduce capacitance of high-activity
signal nets hence total net switching
power
5
Outline
•
•
•
•
•
Introduction
Activity-based register clustering
Activity-based net weighting
Experiments
Conclusions
6
Large Weight for Clock Net?
• Not a good idea
• May only affect registers close to boundaries
• Introduce hot spots and highly congested areas
7
Distribution of Clock Tree Capacitance
• Observation: most of the clock tree
capacitance (e.g., 80%) is at the leaf level
Clock-Tree Capacitance Distribution on A Customer Design
160.00
140.00
Capacitance (pf)
120.00
100.00
wire cap
80.00
pin cap
60.00
40.00
20.00
0.00
0
1
2
3
4
5
6
7
8
9
Level
8
Register Clustering
• Goal: reduce capacitance of a clock net
• Method: clumping the registers within the
same leaf cluster of the clock tree into a
smaller area
• Result: reduced leaf-level clock tree
capacitance and potentially clock skew
9
Flow of Register Clustering
1. Quick CTS algorithm: group registers into clusters
such that each cluster can become a leaf cluster of
the actual clock tree
2. Group Bounds: constrain the placement of a cluster
of registers within smaller bounding box
10
Quick Clock-Tree Synthesis Algorithm
• Decide a scope of target cluster size heuristically
based on
– size of the clock net
– design rule constraints: max fanout and max load
– user configuration
• Perform clustering for each direction from left,
right, top and down and each target cluster size
• Select the clustering with the best CTS objective
– e.g., minimum clock skew, minimum clock delay,
minimum # clock buffers, etc.
11
Quick CTS Algorithm (contd)
• Start with the leftmost (rightmost, highest or lowest) unclustered clock pin
• Add clock pin with shortest Manhattan distance to the
capacitance weighted centroid of the current cluster
• Grow until target cluster size
• Repeat growing clusters until all done
12
Group Bounds
• Control bounding box of a cluster and
reduce it while still fitting the registers
• Compute current bounding box of registers
• Shrink the bounding box proportionally
• Shrink ratio p
– specified shrinking factor of p0
– switching rate of clock net SR and max
switching rate MSR
13
Aspect Ratio of Bounding Box
• Close to the original bounding box aspect
ratio ARold when shrinking ratio p is close to 1
– without serious increasing of signal net length
• Close to square when shrinking ratio p is
close to 0
– reduced clock skew
• Linear function of original aspect ratio ARold
and shrink ratio p
14
Outline
•
•
•
•
•
Introduction
Activity-based register clustering
Activity-based net weighting
Experiments
Conclusions
15
Pros and Cons of Register Clustering
• Effectively reduce capacitance of leaf-level
clock tree
• Increase the length of some signal nets
• Cancel out clock power reduction
16
Activity-Based Net Weighting
• Goal: reduce capacitance of signal nets
• Assigning larger weight to signal nets with
higher switching rates
• Combining register clustering and activitybased net weighting further reduces the total
net switching power
17
Activity-Based Net Weighting
• Assign larger weights to nets with higher
switching rates
– T: threshold for selecting high activity nets
– MSSR: maximum signal net switching rate
– W: controls the scope of power weights
Power Weight
1+W
1
T
MSSR
MSR Switching
Rate
18
Compatibility with Timing Weights
• Linear combination of power and timing net
weighting
• Power ratio α : 0 ~ 1
– control the ratio of power weight
– knob for trade-off between timing and power
19
Outline
•
•
•
•
•
Introduction
Activity-based register clustering
Activity-based net weighting
Experiments
Conclusions
20
Experimental Setup
• Implemented on Synopsys IC compiler
• Eight industry circuits:
– #cells: 20k ~ 186k
– #registers: 2.3k ~ 44.2k
– clock power: 32% of total power
– net switching power: 39% of total power
• Power aware placement
– shrink ratio and power ratio around 0.8
21
Experimental Flow
• Commercial IC
implementation flow
• Power analysis: IC
Compiler
Place
CTS
Route
Extract RC
STA
Power Analysis
– specified switching rates
of primary inputs
– net switching rates
estimated by probabilistic
simulation
22
Clock Net Switching Power
Clock Net Switching Power
11.2%
350
300
250
200
150
100
50
0
D1
D2
D3
D4
D5
D6
D7
D8
23
Total Net Switching Power
Total Net Switching Power
25.4%
450
400
350
300
250
200
150
100
50
0
D1
D2
D3
D4
D5
D6
D7
D8
24
Results
Design # Cells # Regs
D1
D2
186K
49K
44244
5621
D2
134K
43528
D4
172K
23372
D5
D6
D7
D8
116K
20K
126K
138K
9071
2315
12864
8727
Methods
Clock
Switching
Power
Reference
153.29
low_power
100.43
imp %
34.48%
Reference
313.12
low_power
288.96
imp %
7.72%
Reference
168.61
low_power
150.87
imp %
10.52%
Reference
102.32
low_power
100.65
imp %
1.63%
Reference
20.74
low_power
18.49
imp %
10.85%
Reference
1.64
low_power
1.54
imp %
6.10%
Reference
21.54
low_power
19.31
imp %
10.35%
Reference
3.18
low_power
2.94
imp %
7.55%
Total AVG imp % 11.15%
Total
Switching
Power
319.86
182.59
42.92%
408.99
364.01
11.00%
302.57
224.95
25.65%
258.53
218.94
15.31%
37.74
32.42
14.10%
3.00
2.44
18.67%
46.87
33.52
28.48%
6.35
3.36
47.09%
25.40%
Total
Power
Clock
WL
Clock
Skew
WNS
Cell
Area
908.51
737.71
18.80%
425.66
380.22
10.68%
1127.23
1054.08
6.49%
789.27
717.80
9.06%
127.27
117.92
7.35%
10.64
9.58
10.03%
133.28
113.58
14.78%
21.97
18.84
14.24%
11.43%
879529
843626
4.08%
84601
77964
7.85%
1492789
1266024
15.19%
484661
482264
0.49%
173554
143063
17.57%
46130
39254
14.91%
260509
252242
3.17%
114542
95760
16.40%
9.96%
0.156
0.110
1.38%
0.028
0.023
0.10%
1.180
0.427
5.02%
0.095
0.088
0.18%
0.169
0.174
-0.03%
0.031
0.030
0.00%
0.222
0.249
-0.68%
0.178
0.285
-0.54%
0.68%
0.34
0.13
6.31%
0.01
0.41
-7.69%
4.61
3.78
5.53%
0.46
0.54
-2.00%
3.73
4.10
-2.47%
0.00
0.00
0.00%
0.15
0.48
-8.25%
3.26
3.38
-0.60%
-1.15%
4619435
5041092
-9.13%
1087912
1161019
-6.72%
42612408
43121730
-1.20%
4871915
4646738
4.62%
2444381
2433401
0.45%
535949
447993
16.41%
3136603
3471139
-10.67%
1603950
1701496
-6.08%
-1.54%
CPU
-25.22%
-24.75%
-51.29%
21.32%
6.96%
-32.57%
-9.02%
16.23%
-12.29%
25
Summary
• Reduction
– clock net switching power: 11.3% (1.6% ~ 34.5%)
– total net switching power: 25.3% (10.5% ~ 47.1%)
– total power: 11.4% (6.5% ~ 18.8%)
– clock WL: 10.1%
– clock skew: random
• Impact
– WNS (worst negative slack): 2.0%
– total cell area: 1.2%
– runtime: 11.5%
26
720
3
700
2.5
680
2
660
1.5
640
WNS
Total Switching Power
Power-Timing Trade-Off with Power Ratio
1
620
600
Power
Timing
580
0.2
0.4
0.7
0.75
0.8
0.85
0.9
0.95
0.5
0
1
Power Net Weighting Ratio
27
Power-Timing Trade-Off with Shrink Ratio
160.00
5.5
4.5
150.00
3.5
145.00
140.00
2.5
135.00
1.5
130.00
Power
125.00
0.5
Timing
120.00
-0.5
0.95
0.9
0.85
0.8
0.75
0.7
0.6
0.4
0.2
Shrink Ratio
28
WNS
Clock Switching Power
155.00
Conclusions
• We have presented a power-aware
placement method that performs activitybased net weighting and register clustering
to reduce the capacitance of high-activity
signal and clock nets
• We have experimented the method on eight
real designs through a complete industrial
physical design flow
• Our approach achieved average 25.3% and
11.4% reduction in net switching and total
power, with 2.0% timing, 1.2% total cell area
and 11.5% runtime degradation
29
Thank You !
30
Download