PowerPoint slides

advertisement
Optimality, Scalability and Stability
study of Partitioning and Placement
Algorithms
Jason Cong, Michail Romesis, Min Xie
UCLA Computer Science Department
This work is partially supported by Semiconductor
Research Corporation and National Science Foundation
Overview
Motivation and related work
 Our contribution

 Construction
of Partitioning Examples with
Known Upper bound
 Construction of Placement Examples with
Known Upper bound
 Optimality, Scalability and Stability study

Conclusions and future work
2
Overview
Motivation and related work
 Our contribution

 Construction
of Partitioning Examples with
Known Upper bound
 Construction of Placement Examples with
Known Upper bound
 Optimality, Scalability and Stability study

Conclusions and future work
3
Motivation
Partitioning

120
100
80
60
40

20
0
FM
PANZA
CLIP
LSR
hMet is
(1982)
(1995)
(1996)
(1997)
(1997)
MCNC

Significant progress
in partitioning
during the mid-tolate 90’s
No significant
improvement in the
last 5 years
Have we reached a
plateau?
ISPD
4
Motivation
Placement

Lack of significant progress in wirelength
reduction
 Rate
of reduction is about 5-10% every 2-3 years
 Latest developments in placement differ mainly in
runtime
Capo [A. Caldwell et al, 2000]
 Dragon [M. Wang et al, 2000]
 Mongrel [S. Hur et al, 2000]
 mPL [T. Chan et al, 2000]
 mPG [C. Chang et al, 2002]


How much is the room for further
improvement?
5
Motivation

Most work compare only with known heuristics
 Use real design based benchmarks
 ISPD98
[C. Alpert 1998]
 WSI [D. Ghosh et al, 1997]
 Use
synthetic benchmarks
 circ
and gen [M. D. Hutton et al, 1998]
 gnl [D. Stroobandt et al, 2000]

Little understanding about the divergence from
the optimal
6
Related Work

Quantified Suboptimality of VLSI
Layout Heuristics [L. Hagen et al, 1995]

?
x

x
x
x
x
x
x
x
x
x



Construct scaled instance with
known upperbound from an
initial problem
Over 10% area suboptimality in
TimberWolf
Notable wirelength
suboptimality in GORDIAN-L
Significant improvement was
possible for placement and
partitioning
But test cases are small, the
largest netlist is less than 40K 7
Related Work

Optimality and Scalability of Existing
Placement Algorithms [C. Chang et al, 2003]

?



Construct instances with known
optimal using the characteristic of
the original problem
Existing placement algorithms can
be 70% to 150% away from the
optimal
Average solution quality
deteriorates by an additional 4%
to 25% when the problem size
increases by a factor of 10
All the connections are local, no
global connections
8
Overview
Motivation and related work
 Our contribution

 Construction
of Partitioning Examples with
Known Upper bound
 Construction of Placement Examples with
Known Upper bound
 Optimality, Scalability and Stability study

Conclusions and future work
9
BEKU Construction Example
Input: t = 16, D={12,8} B = 5
P2
P1

A
C
D
Create two partitions of size 8

Generate 9 2-pin nets that do
not cross the partition line

Generate 3 2-pin nets that cross
the partition line

Generate 6 3-pin nets that do
not cross the partition line


Generate 2 3-pin nets that cross
the partition line
Cutsize = 5

Cutsize improved to 4 after FM
B
10
Construction of Multiway
Partitioning Examples with Known
Upper Bounds (MEKU)
Divide the nodes into m partitions of
equal size
 Create B nets that cross at least two
partitions. The remaining nets stay in one
partition
 Improve by multiway FM

11
BEKU and MEKU Suite
# of nodes
# of nets
500,000
500,000
1,000,000
1,000,000
1,500,000
1,500,000
2,000,000
2,000,000
500,000
500,000
1,000,000
1,000,000
1,500,000
1,500,000
2,000,000
2,000,000
530,705
530,705
1,061,410
1,061,410
1,592,114
1,592,114
2,122,819
2,122,819
530,705
530,705
1,061,410
1,061,410
1,592,114
1,592,114
2,122,819
2,122,819
# of
parts
2
2
2
2
2
2
2
2
8
8
8
8
8
8
8
8
Upper
bound
92,343
111,873
184,714
223,520
276,670
335,242
369,526
447,781
139,943
160,163
279,975
320,457
420,279
479,971
560,275
640,459
2-way partitions
occupy 45-55% of
the total area
 8-way partitions
occupy 11.8-13.3%
of the total area

URL : http://cadlab.cs.ucla.edu/~pubbench/partitioning/
12
Tested three State-of-the-Art
Partitioning Tools



hMetis [G. Karypis et al, 1997]
 Based on multilevel framework
 MHEC and FC clustering algorithms
 Variations of FM for refinement at each level
MLPart [A. Caldwell et al, 2000]
 Based on multilevel framework
 Different algorithms for coarsening (PinEC) and
refinement (VRW)
Flare [J. Cong et al, 2000]
 Two-level hierarchy created by the ESC clustering
algorithm
 Based on the LR bipartitioning engine and the PM
multiway partitioning framework
13
Quality Ratio
Experimental Results on BEKU
1.4
1.35
1.3
1.25
1.2
1.15
1.1
1.05
1
0.95
0.9
15%
17%
19%
21%
23%
25%
Bound (% of nets)
MLPart


hMetis
Flare
MLPart produces the best results (very close to our
estimated upper bound), and Flare the worst
The value of the bound (as a percentage of nets)
influences the quality of hMetis and Flare
14
Experimental Results on BEKU
Minutes
40
30
20
10
0
500000
1000000
1500000
2000000
Circuit size
hMETIS


MLPart
Flare
The runtime scale well (almost linearly)
Flare runs out of memory when problem size exceeds
1M nodes
15
Experimental Results on MEKU
Quality Ratio
2
1.5
1
0.5
0
30%
Bound (% of nets)
hMetis


35%
Flare
hMetis is worse by only 2% when the initial bound is
30%, but the gap increases to 18% for a bound of 35%
MLPart does not support multiway partitioning
16
Placement Examples with Global
Connections
circuit height width
ibm01
ibm02
ibm03
ibm04
ibm05
ibm06
ibm07
ibm08
ibm09
ibm10
ibm11
ibm12
ibm13
ibm14
ibm15
ibm16
ibm17
ibm18
8158
8158
8158
8158
8158
8158
8158
8158
8158
8158
16350
16350
16350
16350
16350
16350
16283
16350
4530
6430
6740
9140
11055
8715
14605
15895
16395
27890
10925
15545
12230
25475
23785
34015
38895
37065
WL of
WL contribution
longest net of longest 10%
7148
51%
14224
46%
10624
58%
15171
53%
19064
47%
13966
61%
14051
51%
16142
60%
13780
55%
30755
53%
19234
59%
26748
52%
19539
59%
26370
61%
27284
63%
42860
59%
45686
56%
52846
64%



Produced by Dragon
on ISPD98
The wirelength
contribution from
global connections
can be significant!
Need to consider the
impact of global
connections
17
Placement Examples with Global
Connections only

Each net connects
either a row or column

Obvious upper bound
 Sum
the length of each
row and column

Similar to datapath
examples
18
Placement Examples with Non-local
Connections

Extend PEKO [ C.Chang 2003] by
introducing non-local nets to mimic global
connections
 All
the modules are of equal size, and there is
no space between rows and adjacent modules
nets of degree i, *di of them are
generated by randomly conneting i modules,
the rest are generated optimally as in PEKO
 For
19
Placement Examples with Non-local
Connections
Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} =0.2
Generate 28 2-pin optimally
Generate 6 2-pin randomly
Generate 16 3-pin optimally
Generate 4 3-pin randomly
Generate 6 4-pin randomly
Generate 1 4-pin randomly
Generate 4 5-pin optimally
Generate 2 6-pin optimally
Generate 1 7-pin optimally
Total WL = 160
20
G-PEKU Suite
 Module number extracted from ISPD98
circuit
GPeku01
GPeku05
GPeku10
GPeku15
GPeku18
#cell
12506
28146
68685
161187
210341
#net
224
336
525
803
918
#row
113
169
263
402
460
UB
7.93E+05
1.79E+06
4.38E+06
1.03E+07
1.34E+07
URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
21
PEKU Suite
Module number t and NDVs extracted
from ISPD98
 Remove connections with pads
 Vary  from 0 to 10%
 15% white space by expanding one
dimension of the chip

22
PEKU Suite
% nonlocal
nets
0
0.25%
0.50%
Up to
10%
circuit
#cell
#net
#row
Peku01
Peku05
Peku10
Peku15
Peku18
Peku01
Peku05
Peku10
Peku15
Peku18
Peku01
Peku05
Peku10
Peku15
Peku18
12506
28146
68685
161187
210341
12506
28146
68685
161187
210341
12506
28146
68685
161187
210341
14111
28446
75196
186608
201920
14111
28446
75196
186608
201920
14111
28446
75196
186608
201920
113
169
263
402
460
113
169
263
402
460
113
169
263
402
460
Row
utilizatio
n
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
85%
LB
UB
8.14E+05
1.91E+06
4.73E+06
1.15E+07
1.32E+07
8.14E+05
1.91E+06
4.73E+06
1.15E+07
1.32E+07
8.14E+05
1.91E+06
4.73E+06
1.15E+07
1.32E+07
8.14E+05
1.91E+06
4.73E+06
1.15E+07
1.32E+07
9.23E+05
2.24E+06
6.17E+06
1.71E+07
2.01E+07
1.02E+06
2.63E+06
7.52E+06
2.30E+07
2.75E+07
…
URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
23
Tested four State-of-the-Art Placers




Capo [A. Caldwell et al, 2000]
 Based on multilevel partitioner
 Aims to enhance the routability
Dragon [M. Wang et al, 2000]
 Uses hMetis for initial partition
 SA with bin-based swapping
mPL [T. Chan et al, 2000]
 Nonlinear programming on the coarsest level
 Goto based relaxation
mPG [C. Chang et al, 2002]
 Uses FC clustering and hierarchical density control
 Incremental A-tree for routability
24
Experimental Results on G-PEKU
circuit
GPeku01
GPeku05
GPeku10
GPeku15
GPeku18


Dragon v.2.20 Capo v.8.5 mPG v.1.0
QR
QR
QR
1.98
1.56
1.91
2.01
1.69
1.97
2.02
1.72
1.98
1.99
1.79
1.97
2.02
1.78
1.98
mPL v.2.0
QR
1.69
1.83
1.94
1.97
1.98
The gap between their solutions and the upper bound
varies between 79% and 102% in the worst case
Another validation that there is significant room for
improvement for the placement problem
25
Experimental Results on PEKU
Quality Ratio
2.2
2
1.8
1.6
1.4
1.2
1
0.00%
0.25%
Capo v.8.5


0.50%
0.75%
1.00%
% of non-local nets
Dragon v.2.20
mPG v.1.0
2.00%
5.00%
10.00%
mPL v.2.0
mPL’s QR increases when  is increased from 0 to 0.75%,
while for the other three placers, QRs are steadily decreasing
Absolute value of the QRs may not be meaningful, but it
helps to identify the technique that works best under each
scenario
26
Overview
Motivation and related work
 Our contribution

 Partitioning
Examples with Known Upper
bound
 Placement Examples with Known Upper
bound
 Optimality, Scalability and Stability study

Conclusions and future work
27
Conclusions

Bipartitioning techniques seem fairly mature
 The
best available algorithms perform and scale
very well on examples by our construction

The best available multiway partitioning
algorithms do not perform equally well
 The
worst divergence from upperbound is 18%
by hMetis

There is still significant room for
improvement in circuit placement
 Existing
placement algorithms may produce
solutions far away from the optimal (or upper
bound)
 Their effectiveness depends much on the
characteristic of circuits
28
Future Work

Construction of more synthetic examples
 Measure
routability optimality
 Measure timing optimality
Understand the deficiencies of existing
algorithms using these examples
 Guide the development of new VLSI
CAD algorithms

29
Acknowledgement
Prof. I. Markov for providing Capo’s
latest version
 Prof. S. Lim for providing Flare’s latest
version
 X. Yuan for providing the data of mPG
 J. Shinnerl and K. Sze for providing the
experimental data of mPL

30
THE END
THANK YOU
31
Download