DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for

advertisement
DAOmap: A Depth-optimal Area
Optimization Mapping Algorithm for
FPGA Designs
--------Deming Chen, Jason Cong,Computer Science Department,UCLA
Presented by Shikang Xu
1
Outline
•
•
•
•
•
Introduction
Related Works
Definitions and Problem Fomulation
Algorithm Description
Discuss of Techniques
2
Introduction
• The LUT-based FPGA architecture dominates the
existing programmable chip industry
• FPGA technology mapping converts a given
Boolean circuit into a functionally equivalent
network comprised only of LUTs
3
Related Works
•
Area Minimization
–
–
–
–
Chortle-crf, [Francis, et al, DAC’91]
MIS-pga, [Murgai, et al, ICCAD’91]
Praetor, [Cong, et al, FPGA’99]
Anti-fuse FPGA Mapper, [Kang, et al, ASPDAC’04]
•
Delay Minimization
•
Power Minimization
•
Simultaneous Delay and Area Minimization(Area Minimization under Timing
Constraints )
– DAG-Map, [Chen, et al, DTC’92]
– FlowMap, [Cong, et al, ICCAD’92]
– Edge-map, [Yang, et al, ICCAD’94]
– PowerMinMap, [Li, et al, ASPDAC’03]
– Emap, [Lamoureux, et al, ICCAD’03]
– DVmap, [Chen, et al, FPGA’04]
– FlowMap-r, [Cong, et al, TVLSI’94]
– CutMap, [Cong, et al, FPGA’95]
– BoolMap-D, [Legl, et al, DAC’96]
Adopted from Deming Chen, Jason Cong , Computer Science Department, UCLA
4
Definitions
• Cone (Ov):- A subnetwork of the original network,
consisting of v and some of its predecessors, such
that for any node w in Ov, there is a path from w to
v in Ov.
• Fanin cone (Fv):- The maximum cone of v, consisting
of all PI predecessors of v
• Input(Ov):- Denotes the set of distinct nodes
outside Ov which supply inputs to the gates in Ov.
• Cut:- It is a partitioning (X,X’) of a cone Ov such that
X’ is a cone of v.
• Cut-set:- It is represented as V(X,X’), and consists of
input(X’)
5
Definitons
• Cutsize: It is the cardinality of the
PIs
cut-set. A cut is said to be K-feasible if
the cutsize is <=K
a
• Level: The level of a node v is the
length of the longest path from any PI
to the node v.
• Depth: The depth of a network is
Fv
c
b
d
the largest node level in the network.
v
• Mapping Depth: The largest
optimal delay of the mapped circuit.
e
3-feasible
cone Cv
Delay of 2
Picture adopted from Deming Chen, Jason Cong , Computer Science
Department, UCLA
6
Problem Formulation
Area Minimization under Timing Constraint:
Given: a Boolean network; Unity delay model (1 LUT
contributes unit delay)
Goal: cover the network with K-feasible cones (K-LUTs),
such that
• Optimal mapping depth is guaranteed
• Area (number of LUTs) is minimized
7
Algorithm Description
A Cut-enumeration-based method consisting of
cut generation and cut selection
• Cut generation traverses the network from the PI to the PO, and combines
subcuts on the fanin nodes of a target node to generate all the cuts on the
target node
• After generating the cuts, the network is traversed from the PO to the PI,
and the cuts are selected to produce the LUT mapping result.
8
Cut Enumeration
• Cut enumeration means generating all K-feasible cuts
of a cone for a given node effectively
f(K, v) represents all the K-feasible cuts rooted at node v,
operator + is Boolean OR,
K is Boolean AND on its operands, but filtering out all the
resulting p-terms with more than K variables.
9
Cut Enumeration: Example
All the cuts rooted on node s can be generated by combining
the cuts rooted on its fanin nodes q and r. The cuts on the
fanin nodes are called subcuts. Combining C1 with C2 will
form a new cut Cs = {m, n, o, p} rooted on s. If the input of
the new cut exceeds K, the cut is discarded.
10
Cut Enumeration: Time propagation
• The arrival time propagates through each of the cut, and each
cut represents a LUT and hence a unit delay. The minimum
arrival time at a node v is
where C represents every cut generated for v through cut
enumeration. Arri is the minimum arrival time on input signal i
of C. There can be several cuts with Arri , form a set Xv
11
Cut enumeration: Area Propagation
• Similar to the arrival time, the area can also be propagated.
The area is calculated as
Where Uc is the area contributed by the cut C, Ai is the
estimated area of the cone rooted on signal i and f(i) is the
fanout number of signal i. That means that the area on i is
shared and distributed into other fanout nodes of i.
12
Delay and Area Propagation
x
Delay 1, Area 1
w z
y
b
Delay 2, Area 2
Delay 1, Area 1
Optimal
Delay = 1
Area = 1
a
c
Delay 1, Area 1
Optimal
Delay = 1
Area = 1
d
Delay 2, Area 3
e
Delay 2, Area 3
Delay 2, Area 2
Delay 2, Area 2
Optimal
Delay = 1
Area = 1
g
f
Optimal
Delay = 2
Area = 2
Propagation process visits cuts and nodes iteratively
The longest best delay on the POs is the optimal mapping delay
Adopted from Deming Chen, Jason Cong , Computer Science Department, UCLA
13
Area propagation under Timing constraints
• To guarantee optimal mapping depth, we need to propagate the
estimated area together with the minimum arrival time
Av represents the best achievable area under the constraint that it also
generates the optimal mapping delay upto the point of v
• With these formulae, the areas of cuts and nodes are iteratively
calculated until the enumeration process reaches the POs.
• During the cut selection process when we know that v is not on a
critical path, a cut C not belonging to Xv can be chosen as long as it
does not violate the timing constraint.
14
Cost function of a cut
Some Key parameters
• IC: cutsize of C
• NC: number of nodes covered by C
• f(v): fanout number of the root node v
• Rc: number of reconvergent path
•
15
Example of Cost function
In the example above C1 and C2 have the same cutsize, but C2 is
better
C2 covers two sets of reconvergent paths
Having a cut rooted at node 5 will reduce potential duplications
16
Global Duplication Cost Adjustment
• Consider potential node duplications
• Check the sub-cuts for multiple fanouts
• Propagate adjusted cost globally
Ac  [ Ai / f (i)]  Uc  Pf 1  Pf 2
i  input( C )

 NCf IC
Pf  

0
if f(i)  1
otherwise
17
Cut Selection
• From POs to PIs
• Critical paths: optimal delay + best area
available
• Non-critical paths: relaxed delay + better area
18
Cut Selection
• Greedily pick cuts with smallest costs will forfeit
some optimization factors in term of reducing
duplication locally.
• Use heuristics to guide the selection procedure
– Iterative Cut Selection Procedure
– Local Cost Adjustment
•
•
•
Input Sharing
Slack Distribution
Cut Probing
19
Efficiency
• With DAOmap, the researchers report a better area
values with a lower runtime, when compared to
CutMap.
• The impact of the various techniques used, on the
final area values is shown here.
20
Efficiency: Impact of Techniques
• Input sharing proves to be the most important
technique to reduce area because it reduces the
number of edges and node duplications
• The mincost propagation is trying to evaluate
how accurate our cost estimation model is.
• Global duplication cost adjustment offers the
next largest gain, which shows that duplication of
nodes adds to the area cost
21
Summary
• A cut enumeration based cut selection and
generation process for LUT
• Novel techniques make DAOmap gained
significant amount of area and runtime
reduction over a state-of-the-art algorithm
CutMap
22
Download