PowerPoint slides

advertisement
Local Unidirectional Bias for
Smooth Cutsize-delay Tradeoff in
Performance-driven Partitioning
Andrew B. Kahng and Xu Xu
UCSD CSE and ECE Depts.
Work supported in part by MARCO GSRC
Outline
 Motivation
• Performance driven bipartition problem
• New bipartitioning algorithm
• Experimental results
• Conclusion and future work
Partitioning and Performance
The goal of traditional hypergraph partitioning
is to minimize cutsize.
To meet the performance requirement of
current designs, we need a performancedriven partitioner, which considers both
cutsize and delay.
Previous Work (I)
• [Cong et al. ISPD-2002]
– Global clustering based algorithm with retiming
Min-delay
Clustering
w/ retiming
Min-cutsize
Clustering
De-clustering
and refinement
– Reduces delay by 16% while increasing cutsize by 17%
– Requires substantial gate replication
Previous Work (II)
• [Ababei et al. ICCAD-2002]
– Reweighting based method
Path based
Input
Reweighting
1
Global timing analysis
Find critical paths
1
1
1
Cutsize oriented
partitioner, such
as hMetis,MLPart
1
Net based
2
– 14% reduction of delay with 10% increase in cutsize
– 139% increase in runtime compared with hMetis
Motivating Questions
 Can we avoid global timing analysis?
– Global timing analysis is extremely time-consuming
Can we improve path delay without significant
degrading of cutsize?
– Need smooth tradeoff between delay and cutsize
Can we reduce implementation overheads?
– Previous methods store thousands of critical paths and
continuously update them
Outline
• Motivation
Performance driven bipartition problem
• New bipartitioning algorithm
• Experimental results
• Conclusion and future work
Delay Model
Delay = hop_delay + node_delay
hop
FF nodes
Part 0
Part 1
Combinational nodes
cut
[Cong et al. ISPD-2002]
[Ababei et al. ICCAD-2002]
hop_delay=5
node_delay=1
 Delay = 3x5 + 5x1 = 20
hop_delay=Elmore delay
node_delay=constant
Performance Driven Bipartition Problem
Given:
• Hypergraph H=(V,E)
• Area Balance tolerance s (0<s<1), a parameter
to control allowable slack in the area constraint
• a, a given parameter which captures tradeoff
between cutsize and path delay (hopcount)
Find:
A bipartition (V0|V1) which satisfies:
and minimizes a(cutsize)+(1a)(Max_hopcount)
Outline
• Motivation
• Performance driven bipartition problem
 New bipartitioning algorithm
• Experimental results
• Conclusion and future work
Unidirectional Partition
Path delay is minimized with
hopcount = 1 if the partition is
unidirectional (“acyclic”), that is, Part 0
all cuts are in the same direction
Part 1
Problem:
• High cutsize
• No unidirectional solution
Can we achieve “locally unidirectional” partition?
Max hopcount=5
Part 0
Part 1
Max hopcount=3
Part 0
Part 1
V-Shaped Nodes
V-shaped node
If a combinational node v satisfies:
there exist vj, vt in the other part
and a path from vj to vt that includes only v
then v is a V-shaped node
vj
Part 0
Part 1
v
vt
V-Shaped Nodes in Critical Paths
Empirical observations from study of partitioning solutions:
• there are V-shaped nodes in the partitioning solutions
• every V-shaped node is included in many critical paths
• every critical path contains several V-shaped nodes
For testcase 1:
•Number of nets : 16377
•Number of critical paths : 26772
•On average, one critical path contains 27.6 nodes
•On average, one critical path contains 3.4 V-nodes
•On average, one V-node belongs to 233.7 critical paths
Key Idea: V-Shaped Nodes Elimination
Part 0
a
Part 1
d
f
Part 0
c
b
a
Move b
e
d
f
c
b
Part 1
e
Move V-shaped node “b” to reduce path hopcount
PATH: abc hopcount=2
PATH: abc hopcount=0
PATH: dbc hopcount=1
PATH: dbc hopcount=1
PATH: ebc hopcount=1
PATH: ebc hopcount=1
Distance-k V-Shaped Nodes Elimination
Distance-k V-shaped Nodes (Vk Nodes):
Paths of k combinational nodes with neighbors in the other part.
Part 0
Part 0
d
a
Part 1
b
c
d
a
b
c
Move b,c
Part 1
k = 2: Move V2 node “b, c” reduce path hopcount from 2 to 0
Problems with large k:
Cutsize may be greatly increased
New Gain Function
v
v
Before Move
After Move
Gain(v)=δ(0)+ δ(1)
g(v): traditional FM gain
rj(v): reduction of Vj nodes after moving v
Distance-k Unidirectional Algorithm
Calculate initial gains for all nodes and store the gains
Select the node v with maximum gain
/* CLIP-like method: move the cluster that v belongs to */
Reset the gains of all nodes to zero
Move v and update the gains of v and its neighbors
While ( one node not moved)
Select one node v with the maximum updated gain
Move v and update the related gains
Find the point in the move sequence at which the sum of
gains is maximum; undo all moves after this point
Outline
• Motivation
• New bipartitioning algorithm
 Experimental results
• Conclusion and future work
Experimental Setup
• Four industry testcases obtained as LEF/DEF
• Model of Ababei et al. (ICCAD-2002) used to
calculate delay
• Partitioning solutions compared to results of
MLPart
– strongest multilevel netlist partitioning code
– website:
http://nexus6.cs.ucla.edu/GSRC/bookshelf/Slots/Partitioning/MLPart
• All tests on 600MHz Intel Pentium-III Xeon
Biasing against V1 Nodes vs. MLPart
δ(0)=1, δ(1)=10
MLPart+V-shaped nodes
Removal
MLPart
Testcase
cutsize
h
delay
time(s)
cutsize
h
delay
time(s)
1
820.7
5.3
352.8
11.79
856.1
3.3
266.8
12.58
2
169.9
3.5
220.7
13.45
189.8
2.5
211.2
15.32
3
141.3
3
291.6
16.67
152.3
2.3
283.6
18.27
4
408.7
5.3
302.6
12.43
421.2
3.6
252.7
14.03
• Reduction of delay: 4.5%-24.4% average:15.1%
• Increase of cutsize: 3.0%-10.0% average: 4.9%
• Increase of runtime: 6.3%-11.4% average: 9.7%
Using the delay model in Cong et al. ISPD -2002
• Reduction of delay: 4.3%-21.2% average:14.7%
Biasing against V2 Nodes vs. MLPart
δ(0)=1, δ(1)=30, δ(2)=3
Testcase
MLPart
MLPart+Vk=2 nodes Removal
cutsize
h
delay
time(s)
cutsize
h
delay
time(s)
1
820.7
5.3
352.8
11.79
847.5
3
262.1
13.16
2
169.9
3.5
220.7
13.45
183.2
2
202.5
15.67
3
141.3
3
291.6
16.67
149.2
2
275.6
18.92
4
408.7
5.3
302.6
12.43
416.7
3.4
243.5
14.79
• Reduction of delay: 8.9%-30.0% average: 18.7%
• Increase of cutsize: 3.1%-7.2%
average: 3.5%
• Increase of runtime: 11.9%-15.9% average: 13.1%
Using the delay model in Cong et al. ISPD -2002
• Reduction of delay: 8.3%-28.7% average: 17.3%
Outline
• Motivation
• Performance driven bipartition problem
• New bipartitioning algorithm
• Experimental results
 Conclusions and future work
Conclusions
• Simple yet efficient timing-driven partitioning
that does not require global timing analysis
• Negligible implementation, runtime overhead
• Significantly reduces path delay with cutsize
and runtime almost same as leading-edge
MLPart
• Similar improvements observed with different
path delay metrics
• Futures
– Impact of new partitioner on placement
– Efficient methods for biasing δ(k) k>2
Thank you!
Future Work
• Impact of new partitioner on placement
• Efficient methods for biasing δ(k) k>2
Why Performance Driven Partitioning?
• Achieving timing closure becomes increasingly
difficult in deep-submicron technologies due to
non-ideal scaling of interconnect delay
• Routing alone can no longer solve timing problem,
even with aggressive optimizations (buffer
insertion, buffer/wire sizing,…)
Timing needs to be addressed at all design stages
• Partitioning is a critical step in defining
interconnect timing properties, but is traditionally
driven by cutsize objective
Previous Work (I)
• With Logic Replication
– Retiming
– Replication graph
• Without Logic Replication
– Net based reweighting
– Path based reweighting
FM Partitioning and Gain Function
Part 0
Part 1
Start with random
partition
v
v
Part 0
Part 1
Move the
node with the
max gain and
lock it
Before Move
After Move
Gain(v)=-1
Gain(v) = Reduction of cutsize
after moving v
Part 1
Part 0
Keep moving
until all nodes
are locked
Part 0
Part 1
Find the best
point in the move
sequence
Procedure to Calculate rj(v)
Delete all FF nodes and their related edges
In the remaining graph, BFS from v
For each level j from 1 to k
If v is a Vj node before moving, rj’=1
If v is a Vj node after moving, rj’’=1
rj=rj’’-rj’
CLIP Algorithm
CLIP
v
v
Reminiscent of CLIP (Deng et al. DAC 1996) in how it
induces movement of clusters across the cutline.
Distance-k V-Shaped Nodes
Distance-k V-shaped nodes (Vk-node):
If k combinational nodes vi,1 … vi,k satisfy:
vi,1 … vi,k are in the same part
 vj, vt in the other part
 a path from vj to vt and only passes vi,1 … vi,k
then vi,1 … vi,k are distance-k V-shaped nodes
vj
vt
Part 0
vi,1
Part 1
vi,k
Notation
• H(V,E)= circuit hypergraph
• V = set of nodes representing components of the
circuit
• E = set of signal nets
• A bipartition (V0|V1) of H(V,E) divides V into two
disjoint subsets s.t. V= V0V1, which are called
Part 0 and Part 1
• A = the total area of all the nodes in V
• A0 = the area of all the nodes in V0
Download