ABC: A System for Sequential Synthesis and Verification Berkeley Logic Synthesis and Verification Group Robert Brayton Alan Mishchenko Overview • Introduction – What and why ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes A Plethora of ABCs http://en.wikipedia.org/wiki/Abc • ABC (American Broadcasting Company) – A television network… • ABC (Active Body Control) – ABC is designed to minimize body roll in corner, accelerating, and braking. The system uses 13 sensors which monitor body movement to supply the computer with information every 10 ms… • ABC (Abstract Base Class) – In C++, these are generic classes at the base of the inheritance tree; objects of such abstract classes cannot be created… • ABC (supposed to mean “as simple as ABC”) – A system for sequential synthesis and verification at Berkeley Why We Decided to Build ABC • SIS – Outdated, but many research papers on how a new algorithm beats SIS results – Not supported • MVSIS – Gave us a reason to work on logic synthesis – Learned a lot about new methods and better data structures – Could see how specializing to binary could provide substantial improvements. • ABC – Initial intention was to re-implement all algorithms using new data structures (daunting task) – Discovered rewriting AIGs • P. Bjesse and A. Boralv, "DAG-aware circuit compression for formal verification", Proc. ICCAD ’04, pp. 42-49. – Decided to try to keep all transformations fast and scalable • No BDDs • No SOPs • No Espresso BDD What Is Berkeley ABC? • A system for logic synthesis and verification – – – – Fast Scalable High quality results (industrial strength) Exploits synergy between synthesis and verification • A programming environment – Open-source – Evolving and improving over time Design Flow System Specification RTL Logic synthesis Technology mapping Physical synthesis Manufacturing Verification ABC Screenshot Areas Addressed by ABC • Combinational synthesis – AIG rewriting – technology mapping – resynthesis after mapping • Sequential synthesis – retiming – structural register sweep – merging seq. equiv. nodes • Formal verification – – – – combinational equivalence checking bounded sequential verification unbounded sequential verification equivalence checking using synthesis history Combinational Synthesis • AIG rewriting minimizes the number of AIG nodes without increasing the number of AIG levels Rewriting AIG subgraphs • Pre-computing AIG subgraphs Rewriting node A – Consider function f = abc Subgraph 1 Subgraph 2 A A a b Subgraph 3 a c b a c Subgraph 2 Subgraph 1 Rewriting node B a a b a c b b c B a B c a a b a c b c Subgraph 2 a b a c Subgraph 1 In both cases 1 node is saved Technology Mapping Input: A Boolean network (And-Inverter Graph) Output: A netlist of K-LUTs implementing AIG and optimizing some cost function f f Technology Mapping a b c d e The subject graph a b c d e The mapped netlist Sequential Synthesis • Structural register sweep (scleanup) – Merge registers with identical drivers – Replace stuck-at registers by constants • Retiming (dretime) – Minimize the number of registers under delay constraints – Preserves equivalent initial state • Sequential SAT sweeping (scorr) – Detecting and merging sequencially equivalent nodes Formal Verification • Equivalence checking Equivalence checking – Takes two designs and makes a miter (AIG) • Model checking safety properties – Takes design and property and makes a miter (AIG) The goals are the same: to transform AIG until the output is proved constant 0 Breaking News: ABC won a model checking competition at CAV in August 2008 0 D2 D1 Property checking p 0 D1 Model Checking Competition 5. ABC 238 Time (sec) ABC # problems solved Command “dprove” in ABC • • • • • • • • • • • • transforming initial state (“undc”, “zero”) converting into an AIG (“strash”) creating sequential miter (“miter -c”) combinational equivalence checking (“iprove”) bounded model checking (“bmc”) sequential sweep (“scl”) phase-abstraction (“phase”) most forward retiming (“dret -f”) partitioned register correspondence (“lcorr”) min-register retiming (“dretime”) combinational SAT sweeping (“fraig”) for ( K = 1; K 16; K = K * 2 ) – – – – • • • signal correspondence (“scorr”) stronger AIG rewriting (“dc2”) min-register retiming (“dretime”) sequential AIG simulation interpolation (“int”) BDD-based reachability (“reach”) saving reduced hard miter (“write_aiger”) Preprocessors Combinational solver Fast engines Medium engines Slower Main induction loop Last-gasp engines ABC vs. Other Tools Industrial + well documented, fewer bugs - black-box, push-button, no source code, often expensive SIS + traditionally very popular - data structures / algorithms outdated, weak sequential synthesis VIS + very good implementation of BDD-based verification algorithms - not meant for logic synthesis, does not feature the latest SAT-based implementations MVSIS + allows for multi-valued and finite-automata manipulation - not meant for binary synthesis, lacking recent implementations How Is ABC Different From SIS? Boolean network in SIS Equivalent AIG in ABC f f z ze xd yd xy x z y ab x cd cd y e a b c d e a b c d AIG is a Boolean network of 2-input AND nodes and invertors (dotted lines) One AIG Node – Many Cuts Combinational AIG • Manipulating AIGs in ABC f – Each node in an AIG has many cuts – Each cut is a different SIS node – No a priori fixed boundaries • Implies that AIG manipulation with cuts is equivalent to working on many Boolean networks at the same time a b c d e Different cuts for the same node Comparison of Two Syntheses ABC “contemporary” synthesis “Classical” synthesis • AIG network • Boolean network • DAG-aware AIG rewriting (Boolean) • Network manipulation – Several related algorithms (algebraic) • Rewriting – Elimination • Refactoring – Factoring/Decomposition • Balancing • Speedup – Speedup • Node minimization • Node minimization – Boolean decomposition – Espresso – Don’t cares computed using – Don’t cares computed using and SAT simulation BDDs – Resubstitution with don’t cares – Resubstitution • Technology mapping • Technology mapping – Tree based – Cut based with choice nodes Existing Capabilities (2005-2008) Technology mapping with structural choices Combinational logic synthesis Cut-based, heuristic, good area/delay, flexible Fast, scalable, good quality ABC Sequential verification Sequential synthesis Integrated, interacts with synthesis Innovative, scalable, verifiable Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary Command “speedup” Timing Criticality • Critical nodes Primary outputs – Used by many traditional algorithms • Critical edges 4 4 – Used by our algorithm 3 • We pre-compute critical edges of critical nodes 2 – Reduces computation • An edge between critical nodes may not be critical – See illustration: edge 13 3 1 Primary inputs 2 1 Delay-Oriented Restructuring • Using traditional MUX-restructuring – AKA generalized select transform F F x y F00 F01 F10 x y x and y are the critical edge inputs F11 Overall Algorithm mapped netlist performSpeedup ( subject graph S, // S is an And-Inverter Graph mapped netlist M, // M was previously derived by tech-mapping of S timing window w, // w is used to detect the critical paths logic depth l, // l is used to detect a logic cone rooted at a node edge count p ) // p limits the number critical edges of the cone { perform timing analysis of M with unit-delay or LUT-library model; Done only once pre-compute critical section of M as nodes n such that 0 slack(n) w; pre-compute timing-critical edges connecting these nodes; for each timing critical node n { find cone C of M that extends l levels down from n; pick the set of timing-critical edges V feeding into C; if the number of edges in V exceeds p, continue; find logic cone C’ in S corresponding to C in M; find variables V’ in S corresponding to V in M; derive cofactors of the function of C’ w.r.t. variables in V’; build multiplexer tree C’’ of the cofactors using variables in V’; add structural choice C’= C’’ to the subject graph S; } return mapped netlist M’ derived by mapping subject graph S with added choices; } Experimental Results for “speedup” Design PI 11 12 13 14 15 16 17 18 19 20 Geomean Ratio 1 Ratio 2 2,061 50 1,044 391 749 1,041 3,512 11,456 11,292 131 Profile PO 1,897 68 1,098 129 777 736 2,992 10,791 11,454 129 Reg 13,950 1,358 2,074 1,049 7,348 1,063 3,425 10,114 20,184 26258 LUT Baseline Lev Delay Total LUT Lev Speedup Delay Time1, s Time2, s 16,531 3,284 7,147 7,526 16,086 3,611 12,533 27,622 49,871 13,811 7 19 23 14 10 11 20 15 12 8 3.15 8.40 9.35 6.05 4.35 4.70 8.45 6.25 5.00 3.65 77.70 23.88 74.39 251.11 169.25 19.63 178.58 160.22 317.79 72.17 16,652 3,371 7,789 7,573 16,097 3,621 12,830 28,857 50,283 14,186 7 16 16 14 9 11 17 10 9 5 2.95 7.00 6.65 6.05 4.00 4.65 7.40 4.35 3.75 2.45 9.33 3.46 7.37 27.29 18.48 2.77 13.19 22.29 37.83 8.23 87.95 28.68 86.71 280.41 188.00 22.71 199.36 184.63 355.19 81.60 10,804 1 11.49 1 4.99 1 72.13 11,023 1.020 9.80 0.854 4.29 0.860 8.77 82.29 0.107 1 LUT – number of LUTs Lev – number of LUT levels Delay – delay using LUT library Total – total runtime of Baseline Time1 – the runtime of AIG restructuring only Time2 – the total runtime of Speedup Geomean – geometric averages of columns Ratios – ratios of geometric averages Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary Basic Inner Core Algorithm (DSD) We use a fast disjoint support decomposition (DSD) algorithm as our underlying subroutine – follows Bertacco and Damiani, "The disjunctive decomposition of logic functions“, ICCAD '97 – but • uses heuristics to speed it up • no BDDs • uses truth tables – limit inputs to up to 16 BDD Disjoint Support Decomposition (DSD) (Simple Disjunctive Decomposition) Theorem 1 [Ashenhurst 1959]. For a completely specified Boolean function, there is a unique maximal DSD (up to the complementation of inputs and outputs and factoring of ANDs/ORs and XORs). F (a , c ) H ( D(a ), c ) E H G 1 F D a C c a D c A x1 x3 B x2 x4 x5 Non-Disjoint Decomposition Definition: A function F has an ( a , b) decomposition if it can be written as F ( x ) H ( D(a, b ), b , c ) where (a , b , c ) is a partition of the variables x and D is a single output function. H The variables in the set b are called the shared variables. The variables a are called the bound set and c the free set. 1 c D a b Non-Disjoint Decomposition Theorem 2: A function F (a , b , c )has an (a , b-) decomposition if and only if each of the cofactors of F with respect to b has a DSD structure in which the variables a are in a separate sub-tree. a {x4 , x5} a {x3} E X Z W C Y x4 x5 b cofactor D x2x1 A x4 G x1 B x5 x3 x2 b cofactor Application of Factoring (uses Theorem 2) Rewriting a k-LUT mapped circuit. • For each LUT, and each cut of no more than 16 inputs, – express the output of the LUT as truth table in terms of the cut variables – F(x) – Find variables b such that its cofactors are support reducing • we exhaustively look for up to two variables in the b set – Take the best (a,b) set and decompose F=H(D(a,b),b,c) – Recursively decompose H and D if they do not fit into a k-LUT. – If improvement, replace LUTs in cut with its new decomposition. Experimental results later Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary Windowing a Node in the Network for Don’t-Care Computation • Definition Boolean network (k-LUT mapped circuit) – A window for a node in the network is the context in which the don’t-cares are computed • A window includes – n levels of the TFI – m levels of the TFO – all re-convergent paths captured in this scope • Window with its PIs and POs can be considered as a separate network Window POs m=3 n=3 Window PIs Care Set Representation “Miter” constructed for the window POs If output is 1 then we care … Window Window Window f f x x s Same window with inverter Resubstitution Resubstitution considers a node in a Boolean network and expresses it using a different set of fanins X X Computation can be enhanced by use of don’t cares Resubstitution with Don’t-Cares Consider all or some nodes in Boolean network. For each node • Create window • Select possible fanin nodes (divisors) • For each candidate subset of divisors – Rule out some subsets using simulation – Check resubstitution feasibility using SAT – Compute resubstitution function using interpolation • A low-cost by-product of completed SAT proofs • Update the network if there is an improvement Resubstitution with Don’t Cares • Given: – node function F(x) to be replaced – care set C(x) for the node – candidate set of divisors {gi(x)} for re-expressing F(x) C(x) F(x) • Find: F’(x) – A resubstitution function h(y) such that F(x) = h(g(x)) on the care set C(x) F(x) • SPFD Theorem: Function h exists if and only if every pair of care minterms, x1 and x2, distinguished by F(x), is also distinguished by gi(x) for some i g1 g2 g3 h(g) g1 g2 g3 Checking Resubstitution using SAT Miter for resubstitution check SPFD theorem in practice 0 B A 1 h(g) 1 0 1 C Ff g1 g2 g3 x1 g1 g2 g3 Ff C x2 1. Note use of care set, C. 2. Resubstitution function exists if and only if SAT problem is unsatisfiable. 3. An h(g) is obtained by interpolation Experimental Results Designs PI PO Reg alu4 apex2 apex4 bigkey clma des diffeq dsip ex1010 ex5p elliptic frisc i10 pdc misex3 s38417 s38584 seq spla tseng 14 39 9 263 383 256 64 229 10 8 131 20 257 16 14 28 12 41 16 52 8 3 19 197 82 245 39 197 10 63 114 116 224 40 14 106 278 35 46 122 0 0 0 224 33 0 377 224 0 0 1122 886 0 0 0 1636 1452 0 0 385 Baseline LUT Level Choices LUT Level Imfs LUT Level Imfs + Lutpack LUT Level 821 992 838 575 3323 794 659 687 2847 599 1773 1748 589 2327 785 2684 2697 931 1913 647 6 6 5 3 10 5 7 3 6 5 10 13 9 7 5 6 7 5 6 7 785 866 853 575 2715 512 632 685 2967 669 1824 1671 560 2500 664 2674 2647 756 1828 649 5 6 5 3 9 5 7 2 6 4 9 12 8 6 5 6 6 5 6 6 558 806 800 575 1277 483 636 685 1282 118 1820 1692 548 194 517 2621 2620 682 289 645 5 6 5 3 8 4 7 2 5 3 9 12 7 5 5 6 6 5 4 6 453 787 732 575 1222 480 634 685 1059 108 1819 1683 547 171 446 2592 2601 645 263 645 5 6 5 3 8 4 7 2 5 3 9 12 7 5 5 6 6 5 4 6 geomean 1168 6.16 1103 5.66 716 5.24 677 5.24 Ratio Ratio 1.000 1.000 0.945 0.919 0.613 0.852 0.580 0.852 1.000 1.000 0.946 1.000 Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary The Main Idea • Consider registers and nodes of a design – Detect candidate equivalences in this set using random/guided simulation – Prove candidates by K-step induction – Merge the resulting equivalences • This is a subset of sequential synthesis with – – – – Practical advantages (does not move registers, etc) Scales to large designs Offers substantial improvements Comes with a verification guarantee Base Case Inductive Case Candidate equivalences: {A,B}, {C,D} ? SAT-2 ? SAT-4 D ? ? D SAT-1 A B 0 SAT-3 A B 0 D SAT-2 D C PIk C PI1 PI0 Proving internal equivalences in a topological order in frame K ? ? C SAT-1 A B Assuming internal equivalences to in uninitialized frames 0 through K-1 A 0 B PI1 0 D Initial state Proving internal equivalences in initialized frames 0 through K-1 C C A PI0 B Symbolic state Dynamic Partitioning (register correspondence) ? A’ = B’ Illustration for two candidate equiv. classes: {A,B}, {C,D} Partition 1 A=B A’ B’ C’ D’ ? C’ = D’ One time-frame of the design A B C D Partition 2 A=B C =D C =D Academic Benchmarks Registers / Area / Delay Baseline Reg Corr Registers 809.9 610.9 0.75 544.3 0.67 6-LUTs 2141 1725 0.80 1405 0.65 6.8 6.33 0.93 5.83 0.86 Delay Ratio Sig Corr Ratio Runtime Reg Corr Geomean Percentage Sig Corr SEC Synt & Map Total 7.186 29.846 81.583 16.760 135.376 0.05 0.22 0.60 0.12 1.00 Columns “Baseline”, “Reg Corr” and “Sig Corr” show geometric means. Industrial Benchmarks Registers / Area / Delay Baseline St Seq Sw Ratio Reg Corr Ratio Sig Corr Ratio Registers 6-LUTs 5500 5248 0.954 4826 0.877 4788 0.871 11497 11100 0.965 10421 0.906 9989 0.869 7.47 7.39 0.989 0.999 0.999 7.35 0.985 Depth Runtime St Seq Sw Reg Corr Sig Corr Geomean 0.84 11.81 Ratio 0.01 0.19 SEC 143.51 223.10 2.29 3.58 Synt & Map 62.72 1.00 In case of multiple clock domains, optimization was applied only to the domain with the largest number of registers. Reasons for Large Improvements • • • • Redundancy introduced by HDL compilers Early logic duplication by the designer Accidental sequential redundancies Sequential redundancies present due to reuse of design components that had more functionality than needed Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary Motivation • Fewer pin-to-pin connections should make the design easier to place and route • Newer FPGAs allow two outputs per LUT – Thus fewer pin-to-pin connections should produce a mapping that “packs” better into dual-output LUTs Area Recovery Overview 1. Perform delay-optimal mapping 2. Recover area off critical paths – Area-flow (global view) • Chooses cuts with better logic sharing Both are important – Exact local area (local view) 3. New idea: Cut-based area recovery algorithms can be extended to minimize edges (pin-to-pin connections) WireMap Algorithm 1. Perform delay-optimal mapping 2. Recover area off critical paths – Area-flow (global view) • Break ties with minimum edge flow – Exact local area (local view) • Break ties with exact local edge count Experimental Setup • • WireMap implemented in ABC Compared WireMap against two algorithms in ABC – – • • • Baseline – basic mapping with area recovery Mapping with Structural Choices – mapping with area recovery for several netlists produced by synthesis WireMap was implemented on top of mapping with choices Used VPR to place/route design for wirelength and critical path delays Used maximum cardinality matching to pack singleoutput LUTs into dual-output LUTs using Results Summary • Comparing WireMap against the best mapping with structural choices in ABC • WireMap results: – Reduction in edges by 9.3% – Reduction in dual-output LUT count by 9.4%, compared to mapping with choices • Single-output LUT count only reduced by 1.3% – Reduction in wire length by 8.5% – Reduction in power by 20% Overview • Introduction – What is ABC? • ABC fundamentals – Areas addressed by ABC • Synthesis • Technology mapping • Verification – Contrast with classical methods • How is ABC different from SIS? • Recent work – – – – – – Speedup Factoring Don’t-care based optimization Scalable sequential synthesis WireMap White boxes • Summary Comb and Seq Boxes FF a n1 FF1 n6 n4 c FF FF3 n3 FF n1 n8 n2 FF b o1 FF4 FF5 o2 n7 FF Seq box FF FF FF o3 FF b o4 FF c Comb box Seq box n2 Treating Boxes as Black FF a n1 FF1 n6 n4 c FF FF3 n3 FF n1 n8 n2 FF b o1 FF4 FF5 o2 n7 FF Seq box FF FF o3 FF FF b n2 o4 FF c Comb box Seq box For simplicity, boxes can be treated as “black”. Thus box outputs become inputs to the rest of the logic and box inputs become outputs. Delay and logic information is lost. Treating Boxes as White FF a n1 FF1 n6 n4 c FF FF3 n3 FF n1 n8 n2 FF b o1 FF4 FF5 o2 n7 FF Seq box FF FF FF o3 FF b n2 o4 FF c Comb box Seq box Example: Nodes o1 and o3 may be equivalent in the design, but this equivalence cannot be detected if the boxes are treated as black. Solution: Consider logic inside white boxes for synthesis, but keep it unchanged during synthesis and mapping. Future Work Integrating synthesis/ mapping/retiming Improving AIG-based synthesis and mapping Co-developing synthesis and verification Creating special configurable design flows ABC Integrating synthesis with place and route Supporting emerging technologies To Learn More • Visit ABC webpage http://www.eecs.berkeley.edu/~alanmi/abc • Read recent papers http://www.eecs.berkeley.edu/~alanmi/publications • Send email – alanmi@eecs.berkeley.edu – brayton@eecs.berkeley.edu