Going Places with ABC

advertisement
ABC: A System for
Sequential Synthesis and
Verification
Berkeley
Logic Synthesis and Verification
Group
Robert Brayton
Alan Mishchenko
Overview
• Introduction
– What and why ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
A Plethora of ABCs
http://en.wikipedia.org/wiki/Abc
• ABC (American Broadcasting Company)
– A television network…
• ABC (Active Body Control)
– ABC is designed to minimize body roll in
corner, accelerating, and braking. The system
uses 13 sensors which monitor body
movement to supply the computer with
information every 10 ms…
• ABC (Abstract Base Class)
– In C++, these are generic classes at the base
of the inheritance tree; objects of such abstract
classes cannot be created…
• ABC (supposed to mean “as simple as ABC”)
– A system for sequential synthesis and
verification at Berkeley
Why We Decided to Build ABC
• SIS
– Outdated, but many research papers on how a new algorithm beats SIS
results
– Not supported
• MVSIS
– Gave us a reason to work on logic synthesis
– Learned a lot about new methods and better data structures
– Could see how specializing to binary could provide substantial
improvements.
• ABC
– Initial intention was to re-implement all algorithms using new data
structures (daunting task)
– Discovered rewriting AIGs
• P. Bjesse and A. Boralv, "DAG-aware circuit compression for formal
verification", Proc. ICCAD ’04, pp. 42-49.
– Decided to try to keep all transformations fast and scalable
• No BDDs
• No SOPs
• No Espresso
BDD
What Is Berkeley ABC?
• A system for logic synthesis and verification
–
–
–
–
Fast
Scalable
High quality results (industrial strength)
Exploits synergy between synthesis and verification
• A programming environment
– Open-source
– Evolving and improving over time
Design Flow
System Specification
RTL
Logic synthesis
Technology mapping
Physical synthesis
Manufacturing
Verification
ABC
Screenshot
Areas Addressed by ABC
• Combinational synthesis
– AIG rewriting
– technology mapping
– resynthesis after mapping
• Sequential synthesis
– retiming
– structural register sweep
– merging seq. equiv. nodes
• Formal verification
–
–
–
–
combinational equivalence checking
bounded sequential verification
unbounded sequential verification
equivalence checking using synthesis history
Combinational Synthesis
• AIG rewriting minimizes the number of AIG nodes without
increasing the number of AIG levels
Rewriting AIG subgraphs
• Pre-computing AIG subgraphs
Rewriting node A
– Consider function f = abc
Subgraph 1
Subgraph 2
A
A
a b
Subgraph 3

a
c
b
a c
Subgraph 2
Subgraph 1
Rewriting node B
a
a b
a c
b
b
c
B
a
B

c
a
a b
a c
b
c
Subgraph 2
a b
a c
Subgraph 1
In both cases 1 node is saved
Technology Mapping
Input: A Boolean network
(And-Inverter Graph)
Output: A netlist of K-LUTs implementing
AIG and optimizing some cost function
f
f
Technology
Mapping
a
b
c
d
e
The subject graph
a b
c d e
The mapped netlist
Sequential Synthesis
• Structural register sweep (scleanup)
– Merge registers with identical drivers
– Replace stuck-at registers by constants
• Retiming (dretime)
– Minimize the number of registers under delay
constraints
– Preserves equivalent initial state
• Sequential SAT sweeping (scorr)
– Detecting and merging sequencially equivalent nodes
Formal Verification
• Equivalence checking
Equivalence checking
– Takes two designs and makes
a miter (AIG)
• Model checking safety
properties
– Takes design and property and
makes a miter (AIG)
The goals are the same: to
transform AIG until the output
is proved constant 0
Breaking News: ABC won a
model checking competition
at CAV in August 2008
0
D2
D1
Property checking
p
0
D1
Model Checking Competition
5.
ABC
238
Time
(sec)
ABC
# problems solved
Command “dprove” in ABC
•
•
•
•
•
•
•
•
•
•
•
•
transforming initial state (“undc”, “zero”)
converting into an AIG (“strash”)
creating sequential miter (“miter -c”)
combinational equivalence checking (“iprove”)
bounded model checking (“bmc”)
sequential sweep (“scl”)
phase-abstraction (“phase”)
most forward retiming (“dret -f”)
partitioned register correspondence (“lcorr”)
min-register retiming (“dretime”)
combinational SAT sweeping (“fraig”)
for ( K = 1; K  16; K = K * 2 )
–
–
–
–
•
•
•
signal correspondence (“scorr”)
stronger AIG rewriting (“dc2”)
min-register retiming (“dretime”)
sequential AIG simulation
interpolation (“int”)
BDD-based reachability (“reach”)
saving reduced hard miter (“write_aiger”)
Preprocessors
Combinational solver
Fast engines
Medium engines
Slower
Main induction loop
Last-gasp engines
ABC vs. Other Tools
 Industrial
+ well documented, fewer bugs
- black-box, push-button, no source code, often expensive
 SIS
+ traditionally very popular
- data structures / algorithms outdated, weak sequential synthesis
 VIS
+ very good implementation of BDD-based verification algorithms
- not meant for logic synthesis, does not feature the latest SAT-based
implementations
 MVSIS
+ allows for multi-valued and finite-automata manipulation
- not meant for binary synthesis, lacking recent implementations
How Is ABC Different From SIS?
Boolean network in SIS
Equivalent AIG in ABC
f
f
z
ze
xd  yd  xy
x
z
y
ab
x
cd  cd
y
e
a
b
c
d
e
a b c
d
AIG is a Boolean network of 2-input
AND nodes and invertors (dotted lines)
One AIG Node – Many Cuts
Combinational AIG • Manipulating AIGs in ABC
f
– Each node in an AIG has many cuts
– Each cut is a different SIS node
– No a priori fixed boundaries
• Implies that AIG manipulation with
cuts is equivalent to working on
many Boolean networks at the
same time
a
b
c
d
e
Different cuts for the same node
Comparison of Two Syntheses
ABC “contemporary” synthesis
“Classical” synthesis
• AIG network
• Boolean network
• DAG-aware AIG rewriting (Boolean)
• Network manipulation
– Several related algorithms
(algebraic)
• Rewriting
– Elimination
• Refactoring
– Factoring/Decomposition
• Balancing
• Speedup
– Speedup
• Node minimization
• Node minimization
– Boolean decomposition
– Espresso
– Don’t cares computed using
– Don’t cares computed
using and SAT
simulation
BDDs
– Resubstitution with don’t cares
– Resubstitution
• Technology mapping
• Technology mapping
– Tree based
– Cut based with choice nodes
Existing Capabilities (2005-2008)
Technology mapping
with structural choices
Combinational logic
synthesis
Cut-based, heuristic, good
area/delay, flexible
Fast, scalable, good quality
ABC
Sequential verification
Sequential synthesis
Integrated, interacts with
synthesis
Innovative, scalable,
verifiable
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
Command “speedup”
Timing Criticality
• Critical nodes
Primary outputs
– Used by many traditional
algorithms
• Critical edges
4
4
– Used by our algorithm
3
• We pre-compute critical edges
of critical nodes
2
– Reduces computation
• An edge between critical
nodes may not be critical
– See illustration: edge 13
3
1
Primary inputs
2
1
Delay-Oriented Restructuring
• Using traditional MUX-restructuring
– AKA generalized select transform
F
F
x

y
F00
F01
F10
x y
x and y are the critical edge inputs
F11
Overall Algorithm
mapped netlist performSpeedup (
subject graph S, // S is an And-Inverter Graph
mapped netlist M, // M was previously derived by tech-mapping of S
timing window w, // w is used to detect the critical paths
logic depth l,
// l is used to detect a logic cone rooted at a node
edge count p )
// p limits the number critical edges of the cone
{
perform timing analysis of M with unit-delay or LUT-library model;
Done only once
pre-compute critical section of M as nodes n such that 0  slack(n)  w;
pre-compute timing-critical edges connecting these nodes;
for each timing critical node n {
find cone C of M that extends l levels down from n;
pick the set of timing-critical edges V feeding into C;
if the number of edges in V exceeds p, continue;
find logic cone C’ in S corresponding to C in M;
find variables V’ in S corresponding to V in M;
derive cofactors of the function of C’ w.r.t. variables in V’;
build multiplexer tree C’’ of the cofactors using variables in V’;
add structural choice C’= C’’ to the subject graph S;
}
return mapped netlist M’ derived by mapping subject graph S with added choices;
}
Experimental Results for “speedup”
Design
PI
11
12
13
14
15
16
17
18
19
20
Geomean
Ratio 1
Ratio 2
2,061
50
1,044
391
749
1,041
3,512
11,456
11,292
131
Profile
PO
1,897
68
1,098
129
777
736
2,992
10,791
11,454
129
Reg
13,950
1,358
2,074
1,049
7,348
1,063
3,425
10,114
20,184
26258
LUT
Baseline
Lev
Delay
Total
LUT
Lev
Speedup
Delay
Time1, s
Time2, s
16,531
3,284
7,147
7,526
16,086
3,611
12,533
27,622
49,871
13,811
7
19
23
14
10
11
20
15
12
8
3.15
8.40
9.35
6.05
4.35
4.70
8.45
6.25
5.00
3.65
77.70
23.88
74.39
251.11
169.25
19.63
178.58
160.22
317.79
72.17
16,652
3,371
7,789
7,573
16,097
3,621
12,830
28,857
50,283
14,186
7
16
16
14
9
11
17
10
9
5
2.95
7.00
6.65
6.05
4.00
4.65
7.40
4.35
3.75
2.45
9.33
3.46
7.37
27.29
18.48
2.77
13.19
22.29
37.83
8.23
87.95
28.68
86.71
280.41
188.00
22.71
199.36
184.63
355.19
81.60
10,804
1
11.49
1
4.99
1
72.13
11,023
1.020
9.80
0.854
4.29
0.860
8.77
82.29
0.107
1
LUT – number of LUTs
Lev – number of LUT levels
Delay – delay using LUT library
Total – total runtime of Baseline
Time1 – the runtime of AIG restructuring only
Time2 – the total runtime of Speedup
Geomean – geometric averages of columns
Ratios – ratios of geometric averages
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
Basic Inner Core Algorithm (DSD)
We use a fast disjoint support decomposition
(DSD) algorithm as our underlying subroutine
– follows Bertacco and Damiani, "The disjunctive
decomposition of logic functions“, ICCAD '97
– but
• uses heuristics to speed it up
• no BDDs
• uses truth tables
– limit inputs to up to 16
BDD
Disjoint Support Decomposition (DSD)
(Simple Disjunctive Decomposition)
Theorem 1 [Ashenhurst 1959]. For a completely
specified Boolean function, there is a unique
maximal DSD (up to the complementation of inputs
and outputs and factoring of ANDs/ORs and XORs).
F (a , c )  H ( D(a ), c )
E
H
G
1
F
D
a
C
c
a
D
c
A
x1
x3 B
x2
x4
x5
Non-Disjoint Decomposition
Definition: A function F has an ( a , b) decomposition if it can be written as
F ( x )  H ( D(a, b ), b , c )
where (a , b , c ) is a partition of the variables x
and D is a single output function.
H
The variables in the set b are
called the shared variables.
The variables a are called the
bound set and c the free set.
1
c
D
a
b
Non-Disjoint Decomposition
Theorem 2: A function F (a , b , c )has an (a , b-)
decomposition if and only if each of the
cofactors of F with respect to b has a DSD
structure in which the variables a are in a
separate sub-tree. a  {x4 , x5}
a  {x3}
E
X
Z
W
C
Y
x4
x5
b cofactor
D
x2x1
A
x4
G
x1 B
x5
x3
x2
b cofactor
Application of Factoring
(uses Theorem 2)
Rewriting a k-LUT mapped circuit.
• For each LUT, and each cut of no more than 16
inputs,
– express the output of the LUT as truth table in terms
of the cut variables – F(x)
– Find variables b such that its cofactors are support
reducing
• we exhaustively look for up to two variables in the b set
– Take the best (a,b) set and decompose
F=H(D(a,b),b,c)
– Recursively decompose H and D if they do not fit into
a k-LUT.
– If improvement, replace LUTs in cut with its new
decomposition.
Experimental results later
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
Windowing a Node in the Network
for Don’t-Care Computation
• Definition
Boolean network (k-LUT mapped circuit)
– A window for a node in the
network is the context in which
the don’t-cares are computed
• A window includes
– n levels of the TFI
– m levels of the TFO
– all re-convergent paths
captured in this scope
• Window with its PIs and POs can
be considered as a separate
network
Window POs
m=3
n=3
Window PIs
Care Set Representation
“Miter” constructed for the window POs
If output is 1 then we care
…
Window
Window
Window
f
f
x
x
s
Same window
with inverter
Resubstitution
Resubstitution considers a node in a Boolean network
and expresses it using a different set of fanins
X
X
Computation can be enhanced by use of don’t cares
Resubstitution with Don’t-Cares
Consider all or some nodes in Boolean network.
For each node
• Create window
• Select possible fanin nodes (divisors)
• For each candidate subset of divisors
– Rule out some subsets using simulation
– Check resubstitution feasibility using SAT
– Compute resubstitution function using interpolation
• A low-cost by-product of completed SAT proofs
• Update the network if there is an improvement
Resubstitution with Don’t Cares
• Given:
– node function F(x) to be replaced
– care set C(x) for the node
– candidate set of divisors {gi(x)} for
re-expressing F(x)
C(x) F(x)
• Find:
F’(x)
– A resubstitution function h(y) such
that F(x) = h(g(x)) on the care set
C(x) F(x)
• SPFD Theorem: Function h exists
if and only if every pair of care
minterms, x1 and x2, distinguished
by F(x), is also distinguished by
gi(x) for some i
g1 g2 g3
h(g)
g1 g2 g3
Checking Resubstitution using SAT
Miter for resubstitution check
SPFD
theorem
in
practice
0
B
A
1
h(g)
1
0
1
C
Ff g1 g2
g3
x1
g1
g2
g3
Ff
C
x2
1. Note use of care set, C.
2. Resubstitution function exists if and only if SAT problem is unsatisfiable.
3. An h(g) is obtained by interpolation
Experimental Results
Designs
PI
PO
Reg
alu4
apex2
apex4
bigkey
clma
des
diffeq
dsip
ex1010
ex5p
elliptic
frisc
i10
pdc
misex3
s38417
s38584
seq
spla
tseng
14
39
9
263
383
256
64
229
10
8
131
20
257
16
14
28
12
41
16
52
8
3
19
197
82
245
39
197
10
63
114
116
224
40
14
106
278
35
46
122
0
0
0
224
33
0
377
224
0
0
1122
886
0
0
0
1636
1452
0
0
385
Baseline
LUT
Level
Choices
LUT
Level
Imfs
LUT
Level
Imfs + Lutpack
LUT
Level
821
992
838
575
3323
794
659
687
2847
599
1773
1748
589
2327
785
2684
2697
931
1913
647
6
6
5
3
10
5
7
3
6
5
10
13
9
7
5
6
7
5
6
7
785
866
853
575
2715
512
632
685
2967
669
1824
1671
560
2500
664
2674
2647
756
1828
649
5
6
5
3
9
5
7
2
6
4
9
12
8
6
5
6
6
5
6
6
558
806
800
575
1277
483
636
685
1282
118
1820
1692
548
194
517
2621
2620
682
289
645
5
6
5
3
8
4
7
2
5
3
9
12
7
5
5
6
6
5
4
6
453
787
732
575
1222
480
634
685
1059
108
1819
1683
547
171
446
2592
2601
645
263
645
5
6
5
3
8
4
7
2
5
3
9
12
7
5
5
6
6
5
4
6
geomean
1168
6.16
1103
5.66
716
5.24
677
5.24
Ratio
Ratio
1.000
1.000
0.945
0.919
0.613
0.852
0.580
0.852
1.000
1.000
0.946
1.000
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
The Main Idea
• Consider registers and nodes of a design
– Detect candidate equivalences in this set using
random/guided simulation
– Prove candidates by K-step induction
– Merge the resulting equivalences
• This is a subset of sequential synthesis with
–
–
–
–
Practical advantages (does not move registers, etc)
Scales to large designs
Offers substantial improvements
Comes with a verification guarantee
Base Case
Inductive Case
Candidate equivalences: {A,B}, {C,D}
?
SAT-2
?
SAT-4
D
?
?
D
SAT-1
A
B
0
SAT-3
A
B
0
D
SAT-2
D
C
PIk
C
PI1
PI0
Proving internal
equivalences in
a topological
order in frame K
?
?
C
SAT-1
A
B
Assuming internal
equivalences to in
uninitialized frames
0 through K-1
A
0
B
PI1
0
D
Initial state
Proving internal equivalences in
initialized frames 0 through K-1
C
C
A
PI0
B
Symbolic state
Dynamic Partitioning
(register correspondence)
?
A’ = B’
Illustration for two candidate
equiv. classes: {A,B}, {C,D}
Partition 1
A=B
A’ B’
C’ D’
?
C’ = D’
One time-frame of the design
A B
C D
Partition 2
A=B
C =D
C =D
Academic Benchmarks
Registers / Area / Delay
Baseline
Reg Corr
Registers
809.9
610.9
0.75
544.3
0.67
6-LUTs
2141
1725
0.80
1405
0.65
6.8
6.33
0.93
5.83
0.86
Delay
Ratio
Sig Corr
Ratio
Runtime
Reg Corr
Geomean
Percentage
Sig Corr
SEC
Synt & Map
Total
7.186
29.846
81.583
16.760
135.376
0.05
0.22
0.60
0.12
1.00
Columns “Baseline”, “Reg Corr” and “Sig Corr” show geometric means.
Industrial Benchmarks
Registers / Area / Delay
Baseline St Seq Sw Ratio Reg Corr Ratio Sig Corr Ratio
Registers
6-LUTs
5500
5248 0.954
4826 0.877
4788 0.871
11497
11100 0.965
10421 0.906
9989 0.869
7.47
7.39 0.989
0.999 0.999
7.35 0.985
Depth
Runtime
St Seq Sw Reg Corr Sig Corr
Geomean
0.84
11.81
Ratio
0.01
0.19
SEC
143.51 223.10
2.29
3.58
Synt & Map
62.72
1.00
In case of multiple clock domains, optimization was applied only to the
domain with the largest number of registers.
Reasons for Large Improvements
•
•
•
•
Redundancy introduced by HDL compilers
Early logic duplication by the designer
Accidental sequential redundancies
Sequential redundancies present due to reuse of
design components that had more functionality
than needed
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
Motivation
• Fewer pin-to-pin connections should make the
design easier to place and route
• Newer FPGAs allow two outputs per LUT
– Thus fewer pin-to-pin connections should produce
a mapping that “packs” better into dual-output LUTs
Area Recovery Overview
1. Perform delay-optimal mapping
2. Recover area off critical paths
– Area-flow (global view)
•
Chooses cuts with better logic sharing
Both are
important
– Exact local area (local view)
3. New idea: Cut-based area recovery
algorithms can be extended to minimize
edges (pin-to-pin connections)
WireMap Algorithm
1. Perform delay-optimal mapping
2. Recover area off critical paths
– Area-flow (global view)
•
Break ties with minimum edge flow
– Exact local area (local view)
•
Break ties with exact local edge count
Experimental Setup
•
•
WireMap implemented in ABC
Compared WireMap against two algorithms in ABC
–
–
•
•
•
Baseline – basic mapping with area recovery
Mapping with Structural Choices – mapping with area
recovery for several netlists produced by synthesis
WireMap was implemented on top of mapping with
choices
Used VPR to place/route design for wirelength and
critical path delays
Used maximum cardinality matching to pack singleoutput LUTs into dual-output LUTs using
Results Summary
• Comparing WireMap against the best
mapping with structural choices in ABC
• WireMap results:
– Reduction in edges by 9.3%
– Reduction in dual-output LUT count by
9.4%, compared to mapping with choices
• Single-output LUT count only reduced by 1.3%
– Reduction in wire length by 8.5%
– Reduction in power by 20%
Overview
• Introduction
– What is ABC?
• ABC fundamentals
– Areas addressed by ABC
• Synthesis
• Technology mapping
• Verification
– Contrast with classical methods
• How is ABC different from SIS?
• Recent work
–
–
–
–
–
–
Speedup
Factoring
Don’t-care based optimization
Scalable sequential synthesis
WireMap
White boxes
• Summary
Comb and Seq Boxes
FF
a
n1
FF1
n6
n4
c
FF
FF3
n3
FF
n1
n8
n2
FF
b
o1
FF4
FF5
o2
n7
FF
Seq box
FF
FF
FF
o3
FF
b
o4
FF
c
Comb box
Seq box
n2
Treating Boxes as Black
FF
a
n1
FF1
n6
n4
c
FF
FF3
n3
FF
n1
n8
n2
FF
b
o1
FF4
FF5
o2
n7
FF
Seq box
FF
FF
o3
FF
FF
b
n2
o4
FF
c
Comb box
Seq box
For simplicity, boxes can be treated as “black”. Thus box
outputs become inputs to the rest of the logic and box inputs
become outputs. Delay and logic information is lost.
Treating Boxes as White
FF
a
n1
FF1
n6
n4
c
FF
FF3
n3
FF
n1
n8
n2
FF
b
o1
FF4
FF5
o2
n7
FF
Seq box
FF
FF
FF
o3
FF
b
n2
o4
FF
c
Comb box
Seq box
Example: Nodes o1 and o3 may be equivalent in the design, but this
equivalence cannot be detected if the boxes are treated as black.
Solution: Consider logic inside white boxes for synthesis, but keep it
unchanged during synthesis and mapping.
Future Work
Integrating synthesis/
mapping/retiming
Improving AIG-based
synthesis and mapping
Co-developing synthesis
and verification
Creating special
configurable design
flows
ABC
Integrating synthesis
with place and route
Supporting emerging
technologies
To Learn More
• Visit ABC webpage
http://www.eecs.berkeley.edu/~alanmi/abc
• Read recent papers
http://www.eecs.berkeley.edu/~alanmi/publications
• Send email
– alanmi@eecs.berkeley.edu
– brayton@eecs.berkeley.edu
Download