Alan Mishchenko
UC Berkeley
1
Motivation
Big picture
Problem representation
Algorithms
Sequential synthesis
Combinational synthesis with choices
Technology mapping
Minimum-perturbation retiming
Experimental results
Future work
2
Design size, gate count
1,000,000
10,000
100
10
ABC,
Magic
SIS, VIS,
MVSIS
Espresso,
MIS, SIS
Truth tables
1950-1970
Sum-ofproducts
1980
Binary
Decision
Diagrams
1990
And-Inverter
Graphs
Conjunctive normal forms
2000 2010
Time, years
3
ABC is a public-domain system for logic synthesis and formal verification under development at
Berkeley since 2005
A successor of Espresso, MIS, SIS, VIS, MVSIS
The baseline version of ABC is not applicable to industrial designs because it does not support
Complex flops
Multiple clock domains
Special objects (adders, RAMs, DSPs, etc)
Standard-cell libraries
A fresh start, called Magic, was taken in Fall 2009
Includes new design database that supports these
Integrates application packages for better memory/runtime
Achieves better scalability
4
Verilog,
EDIF, BLIF
Programmable
APIs
AIG rewriting
File / Code interface
Sequential synthesis
Computing choices
Design database
Retiming
Tech mapping
Post-place
Structuring resynthesis for delay
Verification
A. Mishchenko, N. Een, R. K. Brayton, S. Jang, M. Ciesielski, and T. Daniel,
"Magic: An industrial-strength logic optimization, technology mapping, and formal verification tool". Proc. IWLS'10.
5
Framework
Design database
File input / output
Programmable APIs
AIG rewriting
Computing choices
Tech mapping
Structuring for delay
File / Code interface
Design database
Verification
Sequential synthesis
Retiming
Post-place resynthesis
Combinational optimization
AIG rewriting
Choice computation
Technology mapping
Sequential optimization
Retiming
Merging equivalence nodes
Technology mapping
Mapping with choices
Speedup
Verification
Simulation
Comb equivalence checking
Seq equivalence checking
6
Netlist
Original / current / resulting design with “industrial stuff”
AIG: The main data-structure of ABC / Magic
Represents local / global functions
Gets synthesized / mapped / verified
Logic network
Represents the result of technology mapping
7
AIG is a Boolean network composed of two-input ANDs and inverters cd ab
00 01 11 10 F(a,b,c,d) = ab + d(ac’+bc)
00 0 0 1 0
01 0 0 1 1
11 0 1 1 0
10 0 0 1 0 a b d
6 nodes
4 levels a c b c
F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) b a
00 01 11 10
00 0 0 1 0
01 0 0 1 1
11 0 1 1 0
10 0 0 1 0 a c b d b c a d
7 nodes
3 levels
8
An underlying data structure for various computations
Representing both local and global functions
Used in rewriting, resubstitution, simulation, SAT sweeping, induction, etc
A unifying representation for the whole flow
Synthesis, mapping, verification pass around AIGs
Stored multiple structures for mapping (‘AIG with choices’)
The main functional representation in ABC
Foundation of ‘contemporary’ logic synthesis
Source of ‘signature features’ (speed, scalability, etc)
9
Inputting the design
Sequential synthesis
Comb synthesis with choices
Tech mapping
Retiming and resynthesis
Outputting the design
The design is entered from file or through programmable APIs
Internal representation is based on a light-weight data-structure for improved memory and runtime
Sequential synthesis is applied to detect and merge seq equiv objects
Combinational synthesis and mapping are iterated several times, while saving the best result
Optionally, min-perturbation retiming and resynthesis are applied to reduce delay/area after mapping
The design is saved into file or through programmable APIs
Verification is performed between any two points in the flow
10
Combinational equivalence
Two functions, F and G, produce the same output for all input combinations
Sequential equivalence
Two functions, F and G, produce the same value for all reachable states
F
00 01 11 10
00 0 1
01 0 1
11 1 1
10 0 1
0
1
0
0
0
0
0
0
G
00 01 11 10
00 0
01 0
11 1
10 0
1
1
1
1
0 0
1 0
0 0
0 0
Complete Boolean space is shown by highlighting
F G
00 01 11 10
00 01 11 10
00 0
01 0
11 0
10 0
1
0
0
0
1
0
0
0
0
0
0
0
00 0
01 0
11 0
10 0
1
1
1
1
0
0
0
0
0
0
0
0
Reachable state space of 1-hot encoding is shown by highlighting
11
Detect, prove, and merge sequentially equivalent nodes
Seq equiv nodes are equivalent on reachable states
Special case: Comb equiv nodes are equivalent on for any state
B
A B
A
Observations
Can be done using simulation and SAT (without BDDs)
Leads to substantial reduction for large designs (> 10% in area)
Works for large designs (10-15 minutes for 1M gates)
A. Mishchenko, M. L. Case, R. K. Brayton, and S. Jang, "Scalable and scalably-verifiable sequential synthesis", Proc. ICCAD'08.
12
Results
0%
-2%
-4%
-6%
-8%
-10%
-12%
-14%
Ряд1
LUT
-13%
Register
-13,10%
Level
-1,50%
Results collected using a suite of 20 industrial designs
Restructures AIG by applying the following transforms:
Rewriting/refactoring/redecomposition
Tree-balancing
Resubstitution
Minimization with don't-cares, etc
Case study: AIG rewriting
Pre-compute AIG subgraphs for F = abc a b a c
Subgraph 1 a b c
Subgraph 2 b a c
Subgraph 3
A a b a c
Subgraph 1
Rewriting node A
A a b c
Subgraph 2
A. Mishchenko, S. Chatterjee, and R. Brayton, "DAG-aware AIG rewriting:
A fresh look at combinational logic synthesis", Proc. DAC '06.
14
Perform synthesis and keep track of changes
Iterate fast local AIG rewriting with a global view (via hash table)
Collect AIG snapshots and prove equivalences across them
Use equivalences (choices) during technology mapping
Observations
Leads to improved QoR after technology mapping
Successfully applied to 1M gate designs
Traditional synthesis
D1 D2 D3 D4
Synthesis with choices
D1
D2 HAIG D4
D3
15
Customizable structural mapping with priority cuts
Computes a small subset of cuts without impacting the QoR
Uses structural choices
Observations
Controls QoR tradeoffs
Minimizes delay/area, wire count, switching activity, etc
Successfully applied to 1M gate designs
AIG f
Mapped network f
LUT
LUT
LUT a b c d e
Primary outputs a b c d e
Choice node
A. Mishchenko, S. Cho, S. Chatterjee, R. Brayton,
"Combinational and sequential mapping with priority cuts", Proc. ICCAD '07.
Primary inputs
16
Reduces delay, while minimizing the number of flops moved
Produces a trade-off: delay gain vs. the number of flops moved
Handles “industrial stuff”; retimes over white boxes such as adders !
Computes new initial state after backward retiming
Allows the user to control the resources
Desired delay gain
Maximum allowed number of flops moved
Maximum area increase after retiming
Observations
Can be useful before and after placement
Can be implemented efficiently
• Runs in less than a minute for 1M gates
Delay
Flops moved
S. Ray, A. Mishchenko, R. K. Brayton, S. Jang, and T. Daniel, "Minimum-perturbation retiming for delay optimization". Proc. IWLS'10.
17
Property checking
Takes design and property and makes a miter (AIG)
Equivalence checking
Takes two designs and makes a miter (AIG)
The goal is to transform AIG until the output can be proved const 0
Equivalence checking in Magic is based on the model checker that won Hardware Model Checking
Competition in 2008 and 2010 http://fmv.jku.at/hwmcc10/results.html
Property checking p
D1
Equivalence checking
D1 D2
0
0
18
Box IOs are treated as PI/POs in synthesis
Losing the correlation of box outputs/inputs
Restricting synthesis due to broken logic paths
Not being able to propagate delays through the boxes
Sequential synthesis doesn’t work well
Clock domains
Represent clock signal in the data-base
Annotate flops with their clock-domain number in the AIG
Separate clock domains in sequential transforms
Complex controls of the flops
Use parametrized flop model
Perform elaboration of control signals if needed
Handle asynchronous reset carefully!
Industrial primitives (adders, RAMs, DSPs, etc)
Use boxes (black/white, comb/seq, merge/no_merge, etc)
Currently propagates timing information, improves quality of synthesis
•
Elaborate boxes for seq synthesis, but do not map them
Need better support for userspecified attributes (don’t-touch, etc)
20
Integrated Magic into an industrial FPGA synthesis flow
Experimented with the full flow, including P&R
Did not use retiming
Did not use post-placement re-synthesis
Verified by running Magic and in-house simulation tools
Experimented with 20 designs, from 175K to 648K LUT4
Two experimental runs:
“Reference” stands for the typical industrial flow without Magic
“Magic” stands for the new flow with Magic
Frontend
Design entry, high-level synthesis, quick mapping
Magic
Seq and comb synthesis, mapping, legalization
Backend
Placement, routing, design rule checking, etc
21
Circuits
C9
C10
C11
C12
C13
C14
C15
C16
C5
C6
C7
C8
C1
C2
C3
C4
C17
C18
C19
C20
Geomean
Ratio
Profile Reference
PI PO LUT FF
736 369 174972 113157
150 67 187037 112991
4 80 199097 53954
517 253 206725 132416
4 280 212124 64120
803 258 255415 166644
24 10 296152 133704
124 58 323818 86712
268 132 413017 195150
205 94 439963 134139
148 456 455429 160450
4 3 455630 20277
4 240 470436 230811
218 69 522988 311436
377 183 575355 351911
73 33 599413 216051
136 66 618377 259844
136 66 621875 249327
146 391 630918 275871
135 32 648849 353940
Lev fMAX Time LUT FF
12 128.53 1.05 173561 100398
18 91.32 0.53 161303 93930
27 68.49 0.69 137126 36190
11 105.37 1.31 197029 114745
26 68.82 0.65 152799 49513
11 113.25 2.08 255026 148445
17 89.93 0.72 246908 114002
32 40.68 1.99 346516 86662
18 81.50 1.40 375481 174306
20 63.17 3.55 445950 133575
96 27.53 2.23 398428 149126
6 66.67 0.78 152414 19446
28 53.59 3.30 462010 225676
17 68.78 1.83 448426 257996
10 136.05 2.59 575672 349715
4 202.02 1.07 599413 216051
56 47.66 2.75 562367 243084
27 45.68 4.60 606135 247825
55 46.36 2.50 572834 259336
7 127.71 2.45 645501 353616
Magic
Lev fMAX
10 133.87
16 95.69
20 75.59
8 129.20
19 77.70
8 123.00
14 120.48
25 47.08
15
15
79.81
69.06
56 33.11
6 100.40
18
15
57.34
69.40
8 136.99
4 209.21
34
27
53.53
52.58
36 50.76
5 136.43
Time
1.61
2.64
1.90
0.41
6.18
2.19
2.95
1.79
0.70
0.67
0.77
0.67
0.74
1.00
0.90
1.94
2.61
4.03
2.51
2.91
377883 150015 18.54 74.768 1.591 329751 135972 14.40 83.572 1.541
1 1 1 1 1 0.873 0.906 0.777 1.118 0.969
22
QoR
20,00%
10,00%
0,00%
-10,00%
-20,00%
-30,00%
-40,00%
-50,00%
-60,00%
Ряд1 fMAX
11,80%
LUT count
-12,70%
Registers
-9,40%
Levels
-22,30%
Total
Runtime
-3,10%
P&R Runtime
-50%
Continue to improve application packages
AIG rewriting, tech-mapping, sequential synthesis, etc
Improve integration of logic and physical synthesis
Synthesis/mapping/retiming before placement
Retiming/restructuring after placement
Extend the flow to work for other technologies
Macro cells
Standard cells
24