Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Vishwani D. Agrawal

advertisement
Input-Specific Dynamic Power
Optimization for VLSI Circuits
Fei Hu
Intel Corp.
Folsom, CA 95630, USA
Vishwani D. Agrawal
Department of ECE
Auburn University, AL 36849, USA
October 5, 2006
Outline
Background
–
–
–
–
Dynamic power dissipation
Glitch reduction
Previous LP model with fixed gate delay
Process-variation-resistant LP model
Input-specific optimization
– Without process-variation
– With process-variation
Experimental results
Conclusion
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
2
Background
Dynamic power dissipation
– Pdyn= Pswitching + Pshort-circuit
Switching power dissipation
– Pswitching = 1/2 kCLVdd2fclk
Vdd
Vdd
1
off
0
on
1
1
0
0
ic
on
isupply 1
0
off
CL
Gnd
Oct. 5, 2005
CL
Gnd
Fei Hu, ISLPED 2006, Tegernsee, Germany
3
Background
Glitch reduction
– A important dynamic power reduction technique
Static glitch
Dynamic glitch
– Glitch power consumes 30~70% Pdyn
– Related techniques
Balanced delay
Hazard filtering
Transistor/Gate sizing
Linear Programming approach
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
4
Glitch reduction
Original circuit
1
1
1
Balanced path/ path balancing
– Equalize delays of all path incident on a gate
– Balancing requires insertion of delay buffers.
1.5
.5
.5
1
1
Hazard/glitch filtering
– Utilize glitch filtering effect of gate
– Not necessary to insert buffer
Oct. 5, 2005
.5
Fei Hu, ISLPED 2006, Tegernsee, Germany
1
3
5
Glitch reduction
Transistor/gate sizing
–
–
–
–
Find transistor sizes in the circuit to realize the delay
No need to insert delay buffers
Suffers from nonlinearity of delay model
large solution space, numerical convergence and global
optimization not guaranteed
Linear programming approach
– Adopts both path balancing and hazard filtering
– Finds the optimal delay assignments for gates
– Uses technology mapping to map the gate delay assignments to
transistor/gate dimensions
– Guarantees optimal solution, a convenient way to solve a large
scale optimization problem
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
6
Previous LP approach
28
15
1
18
22
4
6
20
7
5
23
8
12
14
27
10
24
21
16
13
29
19
2
11
25
9
3
26
17
Timing window (t, T)
t 6 T6
t7
T7
d7
t
T5
Gate constraints:
T7  T5 + d7
T7  T6 + d7
t7 ≤ t5 + d7
t7 ≤ t6 + d7
d7 > T7 – t7
Circuit delay
constraints:
T11 ≤ maxdelay
T12 ≤ maxdelay
Objective:
Minimize sum
of buffer delays
5
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
7
Process-variation-resistant optimization
Motivation
– Gate delay assumed fixed in previous models
– Variation of gate delay in real circuits
Environmental factors: temperature, Vdd
Physical factors: process variations
– Effect of delay variation
Glitch filtering conditions corrupted
Power dissipation increases from the optimized value
– Our proposal
Consider delay variations in dynamic power optimization
Only consider process variations (major source of delay
variation)
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
8
LP model based on statistical timing
Statistical timing model with random variables
Gate 1
ta1
Ta1
...
Gate j
taj
tai
Taj
...
tak
Tai
Gate i
Tak
di
Gate k
tbi
Oct. 5, 2005
Tbi
Fei Hu, ISLPED 2006, Tegernsee, Germany
9
Outline
Background
–
–
–
–
Dynamic power dissipation
Glitch reduction
Previous LP model with fixed gate delay
Process-variation-resistant LP model
Input-specific optimization
– Without process-variation
– With process-variation
Experimental results
Conclusion
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
10
Input-specific optimization
Motivation
– Previous LP models guarantee glitch filtering for ANY
input vector sequence
Ti - ti < di for all gates
– Redundancy in optimization
Insertion of more buffers
Increased overhead in power/area
– In reality, gates are under embedded environments
Optimization for input vector sequence that is possible for the
circuit, e.g., functional vectors
Same reduction in power dissipation with lower overheads
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
11
Input-specific optimization
Glitch generation pattern
– Input vector pair that can potentially generate a glitch
– AND gate example:
1
1
0
0
1
0
1
1
0
1
0
0
1
0
Glitch generation probability Pg[ i ] = Ng[ i ] / N
– Probability glitch-generation pattern occurs at inputs of gate i
– Steady state signal values match the pattern
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
12
Input-specific optimization
Application to basic LP model w/ fixed gate delay model
– Static optimization
Only static glitches/hazards considered
– Relaxation of constraints
Relax glitch filtering constraints where glitches unlikely
Ti - ti < di
=> (Ti – ti)*i < di
Selective relaxation

0 if Pg [i]  0
i  

1 if Pg [i]  0
Generalized relaxation
i  1  e
Oct. 5, 2005
 Pg [ i ] 
Fei Hu, ISLPED 2006, Tegernsee, Germany
13
Input-specific optimization
Application to process-variation-resistant LP model
based on statistical timing
– Static optimization
– Relaxation of constraints
di  [Wi  3  k ( Wi  r  di )   ]  i ;
Selective relaxation
Generalized relaxation
– Tuning factor
Original objective
Minimize
d ;
j
( j  buffers)
j
Current objective
Minimize
d
j
Oct. 5, 2005
j
 TF  (
1
  di ); ( j  buffers, i  other gates)
N i
Fei Hu, ISLPED 2006, Tegernsee, Germany
14
Input-specific optimization
Why do we need a tuning factor
– Dominating path affects critical delay distribution
PIs
Can be [1,41]
Dominating path
41
0
Other logic
Always 0
Oct. 5, 2005
1
20
40
1
0
Fei Hu, ISLPED 2006, Tegernsee, Germany
1
PO
1
15
Outline
Background
–
–
–
–
Dynamic power dissipation
Glitch reduction
Previous LP model with fixed gate delay
Process-variation-resistant LP model
Input-specific optimization
– Without process-variation
– With process-variation
Experimental results
Conclusion
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
16
Experimental results
Experimental procedure
Circuit
– Power estimation
Event driven logic simulation
Fanout weighted sum of
switching activities
Monte-Carlo simulation with
1,000 samples of delays under
process-variation
Data
extraction
Constraint set data
Dmax
r, 
LP
models
Gate delays
– Results analysis
Un-Opt., unit-delay circuit
Opt1, previous basic LP model
w/ fixed gate delay
Opt2, Process-variation-resistant
LP model
IS-Opt1, IS-Opt2, Input-specific
optimizations
Oct. 5, 2005
AMPL
Fei Hu, ISLPED 2006, Tegernsee, Germany
Circuit
generation
Optimized circuit
Logic
simulations
Results
17
Experimental results – input-specific optimization
Application to “Opt1” (basic LP model), IS-Opt1
Un-Opt
c432
c499
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
Oct. 5, 2005
maxdelay
34
68
22
33
48
120
48
120
80
200
64
160
94
235
98
245
228
620
86
215
Pwr.
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
Opt (w/o proc var.)
Pwr.
0.74
0.74
0.94
0.94
0.54
0.54
0.93
0.93
0.53
0.54
0.74
0.74
0.59
0.59
0.56
0.56
0.13
0.13
0.52
0.52
Delay
34
68
22
33
51
121
48
121
82
203
65
163
95
239
100
249
226
620
89
220
Buffers
66
58
48
0
35
30
192
128
62
34
34
9
139
78
167
53
870
857
91
44
Fei Hu, ISLPED 2006, Tegernsee, Germany
IS-Opt (input-specific w/o proc)
Pwr.
0.74
0.74
0.94
0.95
0.54
0.54
0.93
0.93
0.54
0.53
0.74
0.74
0.59
0.59
0.56
0.56
0.13
0.13
0.52
0.52
Delay
35
69
22
33
49
122
48
120
86
204
66
162
101
239
104
250
228
620
88
221
Buffers
66
41
33
0
32
24
113
25
52
3
30
1
122
73
170
52
870
853
84
38
18
Experimental results – input-specific optimization
Application to “Opt2” under process-variation, IS-Opt2 under 15% intra-die
and 5% inter-die variation
Un-opt.
Cir.
DMax
c432
50
99
32
48
70
174
70
174
116
290
93
232
137
341
143
356
331
899
125
312
c499
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
Oct. 5, 2005
Nom.
Pwr.
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
Opt2 (statistical proc)
Nom.
Pwr.
0.74
0.74
0.94
0.94
0.54
0.54
0.93
0.93
0.52
0.52
0.74
0.73
0.59
0.59
0.55
0.55
0.13
0.13
0.52
0.52
Mean
Pwr.
0.76
0.74
0.95
0.95
0.59
0.55
0.98
0.94
0.64
0.58
0.80
0.76
0.66
0.62
0.63
0.60
0.38
0.26
0.59
0.56
Max Dev.
(%)
11.1
3.7
2.0
1.0
18.2
8.6
10.2
3.0
35.8
21.4
13.6
6.2
17.8
10.1
20.8
13.4
223.8
125.3
18.7
11.8
No.
Buf.
88
106
88
129
57
62
305
305
135
190
249
211
281
311
399
418
1121
1473
481
645
IS-Opt2 (input-specific statistical
proc)
Nom.
Pwr.
0.74
0.74
0.94
0.94
0.54
0.54
0.93
0.93
0.52
0.52
0.73
0.73
0.59
0.59
0.55
0.55
0.13
0.13
0.52
0.52
Fei Hu, ISLPED 2006, Tegernsee, Germany
Mean
Pwr.
0.76
0.74
0.95
0.95
0.59
0.56
1.01
0.95
0.64
0.57
0.79
0.75
0.65
0.61
0.63
0.60
0.38
0.26
0.58
0.55
Max Dev.
(%)
9.3
3.3
1.9
1.8
20.4
9.0
13.1
4.7
34.7
18.4
11.3
4.3
15.6
7.4
21.0
13.2
225.2
125.5
18.1
10.9
No.
Buf.
81
76
88
58
38
38
253
160
107
104
186
79
247
188
389
413
1115
1243
389
520
19
Experimental results – input-specific optimization
Critical delay
Nominal delay
Max. deviation
– Similar performance for “Opt2” and “IS-Opt2”
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
20
Outline
Background
–
–
–
–
Dynamic power dissipation
Glitch reduction
Previous LP model with fixed gate delay
Process-variation-resistant LP model
Input-specific optimization
– Without process-variation
– With process-variation
Experimental results
Conclusion
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
21
Conclusions
Explored a new aspect of low-power optimization
for VLSI circuits
– The input-specific Optimization
– Optimizing the circuit for a given input sequence that
may be specified for the circuit.
Defined the concept of glitch-generation probability
– adaptively relax glitch-filtering constraints
Experimental results
– Better solution with fewer delay buffers
– Maintain similar power reduction and delay performance
– Up to 80% and 63% reductions in delay buffers
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
22
Q&A
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
23
Backups
Process and delay variations
Process variations
– Variations due to semiconductor process
VT, tox, Leff, Wwire, THwire,etc.
– Inter-die variation
Constant within a die, vary from one die to another die of a
wafer or wafer lot
– Intra-die variation
Variation within a die
Due to equipment limitations or statistical effects in the
fabrication process, e.g., variation in doping concentration
Spatial correlations and deterministic variation due to CMP and
optical proximity effect
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
25
Delay model and implications
Random gate delay model
– D
total , i  Dnom, i  Dinter,i  Dintra,i
– Truncated normal distribution
– Assume independence
– Variation in terms of σ/Dnom,i ratio
Effect of inter-die variations
– Depends on its effect to switching activities
– Definition of glitch-filtering probability Pglt = P {t2-t1< d}
Signal arrival time t1, t2
Gate inertial delay d
– Theorem 1 states the change of Pglt due to inter-die variation

1
k
k
Pglt   erf( )  erf(
)
2 
2 
2
2  2(r  k ) 
erf(), the error function
k, a path and gate dependent constant
r, σ/Dnom,i ratio for inter-die variations
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
26
Delay model and implications
Process-variation-resistant design
– Can be achieved by path balancing and glitch filtering
– Critical delay may increase
Theorem 2 states that a solution is guaranteed only if circuit delay
is allowed to increase
Proved by example, assuming 10% variation
1
1
1
A
1
1
2.1 3.9
1
1
C
B
Oct. 5, 2005
1
Fei Hu, ISLPED 2006, Tegernsee, Germany
27
LP model based on statistical timing
Statistical timing model with random variables
Gate 1
ta1
Ta1
...
Gate j
taj
tai
Taj
...
tak
Tai
Gate i
Tak
di
Gate k
tbi
Oct. 5, 2005
Tbi
Fei Hu, ISLPED 2006, Tegernsee, Germany
28
LP model based on statistical timing
Minimum-maximum statistics
– needed for tbi, Tbi
– Previous works
tbi  Min(ta1 , ta j , tak );
Tbi  Max(Ta1 , Ta j , Tak );
Min, Max for two normal random variable not necessarily distributed
as normal
Can be approximated with a normal distribution
Requiring complex operations, e.g., integration, exponentiation, etc.
– Challenges for LP approach
Require simple approximation w/o nonlinear operations
Our approximation for C=Max(A,B), A, B, and C are Gaussian RVs
C  Max(  A ,  B )
C  3 C  Max(  A  3 A ,  B  3 B )
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
29
LP model based on statistical timing
Min-Max statistics approximation error
– Negligible when |A-B|> 3(σA+ σB)
– Largest when A=B
P
1
CDFA
Actual CDF for
Max(A,B)
CDFB
0.5
0
Oct. 5, 2005
C  Max(  A ,  B )
Approximated CDF for
Max(A,B)
A B
C 
1
 Max( A  3 A , B  3 B )  C 
3
x
Fei Hu, ISLPED 2006, Tegernsee, Germany
30
LP model based on statistical timing
Variables
– Timing, delay variables with mean  and std dev σ
– Auxiliary variables, TTb , ttb ,Wi  Tbi  tbi , W ,W
i
i
i
i
Constraints
– Gate constraints
Timing window at the inputs for a two-input gate i
Tb  Ta ;TTb  Ta  3 Ta ;
tb  ta ; ttb  ta  3 Ta ;
Tb  Ta ;TTb  Ta  3 Ta ;
tb  ta ; ttb  ta  3 Ta ;
 Tb  (TTb  Tb ) / 3;
 tb  ( tb  ttb ) / 3;
i
1
i
2
i
i
1
i
1
2
i
i
2
1
i
i
i
2
i
1
i
i
2
1
2
i
Timing window at outputs
Ta  Tb  d ;
 Ta  k ( Tb  r  d );
ta  tb  d ;
 ta  k ( tb  r   d );
i
i
Oct. 5, 2005
i
i
i
i
i
i
i
i
Fei Hu, ISLPED 2006, Tegernsee, Germany
i
i
31
LP model based on statistical
timing
Constraints
– Gate constraint
Linear approximation
 Ta   Tb2  (r  d ) 2   Ta  k ( Tb  r  d )
i
i
i
i
i
k  [0.707, 1]; choose k=0.85, since
– Glitch filtering constraints
–
W  Tb  tb ;
i
i
i
i
A B
 A2  B2  A  B;
2
3σ
P
 W  k ( Tb   tb );
i
i
i
d  W  3  k ( W  r  d );
i
i
i
i
– Circuit delay constraint
di-Wi
Ta  (1  3r )  Dmax
i
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
32
LP model based on statistical timing
Parameter
– r, σ/Dnom,i ratio
– Dmax, circuit delay parameter
– , optimism factor
d  W  3  k ( W  r  d )   ;
i
i
i
i
=1, no relaxation
<1, optimistic about the actual glitch width
=0, reduce to previous model
Objective
– Minimize #buffer inserted – sum of buffer delays
Oct. 5, 2005
Fei Hu, ISLPED 2006, Tegernsee, Germany
33
Download