Sides - The University of Texas at Austin

advertisement
VLSI Physical Design Automation
Placement (3)
Prof. David Pan
dpan@ece.utexas.edu
Office: ACES 5.434
3/16/2016
1
Outline
• Wire length driven placement
• Main methods
– Simulated Annealing
– Partition-based methods
– Analytical methods
• Timing and congestion consideration during
placement
• Newer trends
3/16/2016
2
Timing Cost
Critical Path
Delay of the circuit is
defined as the longest
delay among all
possible paths from
primary inputs to
primary outputs.
Interconnection delay
becomes more and
more important in deep
sub-micron regime.
3/16/2016
3
Timing Analysis
netlist with delay for
each gate
PI1
1
4
6
5
PO1
PI2
3
6
6
7
PO2
PI3
1
4
4
5
4
PO3
1
0
PI1
arrival times
1
0
3
PI2
0
PI3
1
3/16/2016
7
4
3
1
13
5
6
9
6
6
4 7
4
7
5
15
14
7
4
18
22
18
PO1
PO2
PO3
4
Timing Analysis
1/5
0/4
PI1
arrival time/required time
1
0/0
3
PI2
0/8
1
PI3
slack = required time arrival time
3/3
1/9
1
0
3
PI2
8
PI3
4
6
4
4
1
3/16/2016
8
5
6
6
7/15
5
15/15
14/18
7
4
18/22
22/22
18/22
PO1
PO2
PO3
7/13
2
4
0
13/15
9/9
4
4
PI1
7/9
2
5
6
0
6
6
4 8
4
6
5
0
4
7
4
4
0
4
PO1
PO2
PO3
5
Another example with interconnect delay –
Same Timing Analysis
22
3
2
5
L
A
T
C
H
19
1
2
4
2
5
L
A
T
C
H
4
3
3/16/2016
1
1
4
1
5
2
6
Timing Driven Placement Approaches
• Path-based
– Most accurate information
– Very slow
• Budgeting
– Inaccurate information
– Hard to budget
– Fast
• Net-based approach
– Net-weighting
3/16/2016
7
Net-Weighting
• Basic approach
– For more timing critical nets (i.e., smaller slack),
assign higher net weights
– Minimize
 w  net _ length(i),
i
i
where
1
wi 
Si
3/16/2016
8
Sensitivity Guided Netweighting for Placement
Driven Synthesis
H. Ren, D. Z. Pan and D.S. Kung
ISPD-04
3/16/2016
9
Figure of Merit (FOM)
• FOM is the total slack difference compared to a
certain slack threshold for all timing end points.
FOM 
tPo
 (Slk (t )  Slk )
Slk ( t )  Slkt
t
• Interpreted as the amount of work left for the physical
synthesis engine or to the designers for manual fix.
• FOM and WNS (worst negative slack) are the two
most important metrics for timing closure in modern
physical synthesis
• However, FOM was not used to guide placement
explicitly
3/16/2016
10
Sensitivity Definitions
• Net length sensitivity to net weight
S
• Net delay sensitivity to net length
L
W
S
T
L

ΔL
ΔW

ΔT
ΔL
• Net slack sensitivity to net weight:
S
Slk
W

ΔT
ΔT ΔL
T
L
T


  S L  SW   SW
ΔW
ΔL ΔW
• FOM sensitivity to net delay
S
FOM
T

ΔFOM
ΔT
• FOM sensitivity to net weight:
S
FOM
W

ΔFOM
ΔFOM ΔT



ΔW
ΔT
ΔW
3/16/2016
S
FOM
T
 SW
T
11
Closed-Form Sensitivity
• For net length to weight sensitivity, we have
S
L
W
 L 
Wsrc  Wsin k  2W
WsrcWsin k
• For delay to wire length sensitivity, we have
ΔT
S L  ΔL  rcL  cRd  rCl
T
• Use switch-level RC and Elmore delay to illustrate the concept
• Good enough during placement
• Can be extended to more accurate models
3/16/2016
12
FOM to Net Delay Sensitivity
• Question: suppose the delay of net i is reduced by a small
amount DT(i), what is the impact to FOM?
• Define: K(i) to be the number of timing end points whose
slack will change due to DT(i)
• Then, we have the following Theorem
FOM
T
S
ΔFOM
(i ) 
  K (i)
ΔT (i)
3/16/2016
13
K(i) Computation
• Topologically sorted order from PO to PI
• Only propagate K(i) to the most timing critical
input pin
(slack, K(i)) pair
(-3, 2)
A
(-3, 2)
(-3, 1)
B
(-1.2, 1)
D
C
(-0.8, 0)
(-0.8, 0)
3/16/2016
(-3, 1)
(-3, 1)
Po1
(-1.2, 1)
Po2
14
Net Weight Generation
• Put these sensitivities together and generate new net
weight

DW (i )   ( Slk t  Slk (i )) S wSlk (i )  SWFOM (i )
Worg (i )

W (i )  
Worg (i )  DW (i )
3/16/2016

Slk (i )  Slk t
Slk (i )  Slk t
15
Experiments
• We compare the placement and physical synthesis
results of three different algorithms on 7 industry
chips (up to 444k movable objects) from IBM
– WL: wire length driven placement with uniform weight
– TS: timing driven placement using slack sensitivity
– TSF: timing driven placement using both slack and FOM
sensitivity
3/16/2016
16
Timing after Placement
FOM
Design
ckt1
ckt2
ckt3
ckt4
ckt5
ckt6
ckt7
Average
ZW
-9134
0
-535
-322
-114
-142
-4
WL
-41650
-6966
-13711
-8057
-28527
-20257
-452
TS
-26093
-4102
-6468
-4024
-15334
-9417
-248
TSF
-25602
-3454
-5595
-3440
-12229
-9536
-131
Improvement
TS
TSF
48%
49%
41%
50%
55%
62%
52%
60%
46%
57%
54%
53%
46%
72%
49%
58%
TSF
-4.254
-1.754
-3.788
-3.605
-2.002
-4.856
-0.432
Improvement
TS
TSF
63%
44%
37%
38%
30%
27%
55%
58%
34%
45%
0%
12%
37%
54%
37%
40%
WNS
Design
ckt1
ckt2
ckt3
ckt4
ckt5
ckt6
ckt7
Average
ZW
-1.702
0.248
-0.55
-0.941
-0.102
-0.508
0.16
WL
-6.274
-2.977
-4.997
-7.218
-3.575
-5.47
-1.135
TS
-3.392
-1.784
-3.684
-3.736
-2.379
-5.484
-0.66
3/16/2016
17
Timing after Physical Synthesis
Design
ckt1
ckt2
ckt3
ckt4
ckt5
ckt6
ckt7
FOM
WL
TS
-7829
-6086
-2059
-384
-1854
-405
-2537
-1844
-4732
-2726
-1481
-541
-94
-8
Average
Design
ckt1
ckt2
ckt3
ckt4
ckt5
ckt6
ckt7
WNS
WL
TS
-0.834
-0.743
-0.705
-0.011
-0.701
-0.139
-2.156
-1.908
-0.472
-0.443
-0.36
-0.293
-0.097
0.182
Average
3/16/2016
TSF
-5170
-631
-422
-1770
-1819
-266
0
Improvement
TS
TSF
22%
34%
81%
69%
78%
77%
27%
30%
42%
62%
63%
82%
91%
100%
58%
65%
TSF
-0.739
-0.073
-0.19
-1.9
-0.341
-0.351
0.283
Improvement
TS
TSF
11%
11%
98%
90%
80%
73%
12%
12%
6%
28%
19%
3%
100%
100%
47%
45%
18
Outline
• Wire length driven placement
• Main methods
– Simulated Annealing
– Partition-based methods
– Analytical methods
• Timing and congestion consideration
• Newer trends
3/16/2016
19
Congestion Minimization
• Traditional placement problem is to minimize
interconnection length (wirelength)
• A valid placement has to be routable
• Congestion is important because it represents
routability (lower congestion implies better
routability)
• There is not yet enough research work on the
congestion minimization problem
3/16/2016
20
Definition of Congestion
Routing demand = 3
Assume routing supply is 1,
overflow = 3 - 1 = 2 on this edge.
Overflow on each edge =
Routing Demand - Routing Supply
(if Routing Demand > Routing Supply)
0 (otherwise)
Overflow =
S
overflow
all edges
3/16/2016
21
Correlation between Wirelength and
Congestion
Total Wirelength = Total Routing Demand
3/16/2016
22
Wirelength  Congestion
A congestion minimized placement
A wirelength minimized placement
3/16/2016
23
Congestion Map of a Wirelength
Minimized Placement
Congested Spots
3/16/2016
24
Congestion Reduction Postprocessing
Reduce congestion globally
by minimizing the
traditional wirelength
Post process the wirelength
optimized placement using
the congestion objective
3/16/2016
25
An Effective Congestion Driven
Placement Framework
André Rohe
University of Bonn, Germany
joint work with Ulrich Brenner
ISPD 2002 (Best Paper)
3/16/2016
26
A dense Placement
• good wirelength
• impossible to route
3/16/2016
27
Possible Solution
• easy to route
• bad wirelength/timing
3/16/2016
28
Congestion Driven Placement
• easy to route + good wirelength
almost no extra computation efford !
3/16/2016
29
Overall Algorithm: Bonn Place
• Partitioning based approach
• Solves QP in each level, followed by partitioning
• Partitioning is done by quadrisection:
circuits are partitioned with minimum movement
(Vygen)
3/16/2016
30
Methods used for congestion driven
placement
• Very fast congestion calculation
• Inflate circuits in congested regions
• Spreading inflated cells
3/16/2016
31
Congestion calculation
• Calculate Steiner Tree for each net
• Probablitiy estimation for each 2-point connection
(similar to Hung & Flynn, Lou et al.)
3/16/2016
32
Quality of congestion calculation
congestion estimation
3/16/2016
33
Quality of congestion calculation
Bonn
Global
HDP
Global
3/16/2016
34
Inflation of circuits
(used previously by Hou et al.)
• Initial inflation (based on pin density)
• Given a circuit c in Region R, c is inflated by up to
100%
• The inflation is based on the congestion in R and the
surrounding regions & the pin density in R
• Deflation is possible if the circuit is no longer critical.
3/16/2016
35
Placement Step 0
3/16/2016
36
Placement Step 1
3/16/2016
37
Placement Step 2
3/16/2016
38
Placement Step 3
3/16/2016
39
Placement Step 4
3/16/2016
40
Placement Step 5
3/16/2016
41
Placement Step 6
3/16/2016
42
Placement Step 7
3/16/2016
43
Spreading inflated cells
• Repartitioning considers 2x2 windows in placement
grid to optimize netlength
• Use extra repartitioning step to move cells away from
overloaded regions
3/16/2016
44
Summary: Algorithm overview
1.
Init:
Set window_set := {chip area}, set circuit_list(chip area):={all circuits}
2.
Main Loop:
While (window size big enough)
Solve a QP to minimize quadratic netlength
For (each window w in window_set)
Quadrisection(w)
Repartitioning
3.
Legalization
3/16/2016
45
Algorithm overview
1.
Init:
Set window_set := {chip area}, set circuit_list(chip area):={all circuits}
For (each c in {all circuits})
Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
2.
Main Loop:
While (window size big enough)
Solve a QP to minimize quadratic netlength
For (each window w in window_set)
Quadrisection(w)
Repartitioning
3.
Legalization
3/16/2016
46
Algorithm overview
1.
Init:
Set window_set := {chip area}, set circuit_list(chip area):={all circuits}
For (each c in {all circuits})
Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
2.
Main Loop:
While (window size big enough)
Solve a QP to minimize quadratic netlength
For (each window w in window_set)
Quadrisection(w)
Compute congestion and update b(c) # update inflation b(c)
Quadrisection(w)
Repartitioning
3.
Legalization
3/16/2016
47
Algorithm overview
1.
Init:
Set window_set := {chip area}, set circuit_list(chip area):={all circuits}
For (each c in {all circuits})
Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
2.
Main Loop:
While (window size big enough)
Solve a QP to minimize quadratic netlength
For (each window w in window_set)
Quadrisection(w)
Compute congestion and update b(c) # update inflation b(c)
Quadrisection(w)
Reduce overloaded windows # extra repartitioning steps
Repartitioning
3.
Legalization
3/16/2016
48
Computational Results
Standard
Chip
CPU
Congestion Driven
len
CPU
len
Blow
IBM 1
0:23 h
7.2 m
0:26 h
7.4 m
10.2 %
IBM 2
0:26 h
7.9 m
0:27 h
9.0 m
6.6 %
IBM 3
3:50 h
134 m
4:39 h
142 m
20.1 %
IBM 4
7:08 h
241 m
7:24 h
270 m
20.2 %
IBM 5
16:10 h
375 m
16:37 h
406 m
57.8 %
+8.7 %
+8.5%
Mean
3/16/2016
49
Computational Results II
Standard
Chip
HDP
ov
CPU
Congestion Driven
len
HDP
ov
CPU
len
IBM 1
81.7
8374
0:15 h
9m
75.5
0
0:05 h
7.5 m
IBM 2
82.7
7000
0:19 h
11.5 m
75.4
0
0:05 h
10.1 m
IBM 3
88.8
78111
47:36 h
162 m
77.3
0
4:51 h
164 m
IBM 4
82.8
972
7:18 h
324 m
75.2
0
2:48 h
326 m
IBM 5
89.9
14382
70:57 h
512 m
84.2
0
29:48 h
527 m
-73 %
-5.2 %
Mean
-9 %
3/16/2016
50
Summary
• In this module, we cover two important
concepts during placement to consider
besides wire length
– Timing driven placement, using net-weighting
• A new sensitivity based net weighting in ISPD’04 paper
– Congestion minimization (using ISPD’02 as an
example)
• congestion estimation
• Inflate cells in congested region
• Spread inflated cells
3/16/2016
51
Download