Synthesis: Verilog Gate Libra ry

advertisement
Synthesis: Verilog → Gates
in
eg
) b
er
lex or b
p
i
a
t
or
mul
2:1 @(sel <= b;
/
/
) z
ays
alw (sel <= a;
f
i
e z
els
end
0
Ga
on
cti ry
t
e
u
Lib Instremo
r ar M
y m ux
A
B
1
A
D
Figures by MIT OCW.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 1
Some History ...
In late 70’s Mead-Covway showed how to lay out transistors
systematically to build logic circuits.
Tools:
Layout editors, for manual design;
Design rule checkers, to check for legal layout
configurations;
Transistor-level simulators;
Software generators to create dense transistor layouts;
1980 : Circuits had 100K transistors
In 80’s designers moved to the use of gate arrays and
standardized cells, pre-characterized modules of circuits, to
increase productivity.
Tools:
To automatically place and route a netlist of cells from
a predefined cell library
The emphasis in design shifted to gate-level schematic
entry and simulation
6.884 - Spring 2005
02/14/05
L05 – Synthesis 2
History continued...
By late 80’s designers found it very tedious to move a gate-level
design from one library to another because libraries could be
very different and each required its own optimizations.
Tools:
Logic Synthesis tools to go from Gate netlists to a
standard cell netlist for a given cell library.
Powerful optimizations!
Simulation tools for gate netlists, RTL;
Design and tools for testability, equivalance checking, ...
IBM and other companies had internal tools that emphasized top
down design methodology based on logic synthesis.
Two groups of designers came together in 90’s: Those who
wanted to quickly simulate their designs expressed in some HDL
and those who wanted to map a gate-level design in a variety of
standard cell libraries in an optimized manner.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 3
Synthesis Tools
Idea: once a behavioral model has been finished why not use
it to automatically synthesize a logic implementation in much
the same way as a compiler generates executable code from a
source program?
a.k.a. “silicon compilers”
Synthesis programs process the HDL then
1 infer logic and state elements
2 perform technology-independent optimizations
(e.g., logic simplification, state assignment)
3 map elements to the target technology
4 perform technology-dependent optimizations
(e.g., multi-level logic optimization, choose
gate strengths to achieve speed goals)
6.884 - Spring 2005
02/14/05
L05 – Synthesis 4
Simulation vs Synthesis
In a HDL like Verilog or VHDL not every thing that can be
simulated can be synthesized.
There is a difference between simulation and synthesis
semantics. Simulation semantics are based on sequential
execution of the program with some notion of concurrent
synchronous processes. Not all such programs can be
synthesized. It is not easy to specify the synthesizable
subset of an HDL
So in today’s lecture we will gloss over 1, briefly discuss 2
and emphasize 3 and 4.
1 infer logic and state elements
2 perform technology-independent optimizations
(e.g., logic simplification, state assignment)
3 map elements to the target technology
4 perform technology-dependent optimizations
(e.g., multi-level logic optimization, choose
gate strengths to achieve speed goals)
6.884 - Spring 2005
02/14/05
L05 – Synthesis 5
Logic Synthesis
assign z = (a & b) | c;
a
b
c
z
a
b
c
z
a
b
// dataflow
assign z = sel ? a : b;
1
0
z
sel
a
sel
b
6.884 - Spring 2005
02/14/05
z
L05 – Synthesis 6
Logic Synthesis (II)
wire [3:0] x,y,sum;
wire cout;
assign {cout,sum} = x + y;
sum[0]
0
sum[1]
sum[2]
sum[3]
full
adder
full
adder
full
adder
full
adder
x[0] y[0]
x[1] y[1]
x[2] y[2]
x[3] y[3]
cout
As a default + is implemented as
a ripple carry editor
6.884 - Spring 2005
02/14/05
L05 – Synthesis 7
Logic Synthesis (III)
module parity(in,p);
parameter WIDTH = 2;
// default width is 2
input [WIDTH-1 : 0] in;
output p;
// simple approach: assign p = ^in;
// here's another, more general approach
reg p;
always @(in) begin: loop
word[3]
integer i; parity
reg parity = 0;
word[2]
for (i = 0; i < WIDTH; i = i + 1)
parity = parity ^ in[i];
p <= parity;
word[1]
word[0]
end
endmodule
XOR with “0” input has
been optimized away…
…
wire [3:0] word;
wire parity;
parity #(4) ecc(word,parity);
// specify WIDTH = 4
6.884 - Spring 2005
02/14/05
L05 – Synthesis 8
Synthesis of Sequential Logic
reg q;
d
// D-latch
always @(g or d) begin
if (g) q <= d;
end
g
D
Q
q
G
If q were simply a combinational function of d, the synthesizer could just
create the appropriate combinational logic. But since there are times when
the always block executes but q isn’t assigned (e.g., when g = 0), the
synthesizer has to arrange to remember the value of “old” value q even if d
is changing → it will infer the need for a storage element (latch, register,
…). Sometimes this inference happens even when you don’t mean it to – you
have to be careful to always ensure an assignment happens each time
through the block if you don’t want storage elements to appear in your
design.
reg q;
// this time we mean it!
d
// D-register
always @(posedge clk) begin
q <= d;
end
6.884 - Spring 2005
02/14/05
D
Q
q
clk
L05 – Synthesis 9
Sequential Logic (II)
reg q;
// register with synchronous clear
d
always @(posedge clk) begin
reset
if (!reset) // reset is active low
q <= 0;
clk
else
q <= d;
end
reg q;
// register with asynchronous clear
always @(posedge clk or negedge reset)
begin
if (!reset) // reset is active low
q <= 0;
else
// implicit posedge clk
q <= d;
end
// warning! async inputs are dangerous!
// there’s a race between them and the
// rising edge of clk.
6.884 - Spring 2005
02/14/05
d
D
Q
q
D
Q
q
clk
reset
L05 – Synthesis 10
Technology-independent* optimizations
• Two-level boolean minimization: based on the assumption
that reducing the number of product terms in an equation
and reducing the size of each product term will result in a
smaller/faster implementation.
• Optimizing finite state machines: look for equivalent FSMs
(i.e., FSMs that produce the same outputs given the same
sequence of inputs) that have fewer states.
• Choosing FSM state encodings that minimize implementation
area (= size of state storage + size of logic to implement
next state and output functions).
** None
None of
of these
these operations
operations isis completely
completely isolated
isolated from
from the
the
target
technology.
But
experience
has
shown
that
it’s
target technology. But experience has shown that it’s
advantageous
advantageous to
to reduce
reduce the
the size
size of
of the
the problem
problem as
as much
much as
as
possible
before
starting
the
technology-dependent
possible before starting the technology-dependent
optimizations.
optimizations. In
In some
some places
places (e.g.
(e.g. the
the ratio
ratio of
of the
the size
size
of
of storage
storage elements
elements to
to the
the size
size logic
logic gates)
gates) our
our assumptions
assumptions
will
will be
be valid
valid for
for several
several generations
generations of
of the
the technology.
technology.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 11
Two-Level Boolean Minimization
Two-level representation for a multiple-output Boolean function:
- Sum-of-products
Optimization criteria:
- number of product terms
- number of literals
- a combination of both
Minimization steps for a given function:
1. Generate the set of prime product-terms for the
function
2. Select a minimum set of prime terms to cover the
function.
State-of-the-art logic minimization algorithms are all based
onthe Quine-McCluskey method and follow the above two steps.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 12
Prime Term Generation
Look for pairs of 0-terms that differ in only
one bit position and merge them in a 1-term
(i.e., a term that has exactly one ‘–’ entry).
Next 1-terms are examined in pairs to
see if the can be merged into 2-terms,
etc. Mark k-terms that get merged into
(k+1) terms so we can discard them later.
2-terms:
Example due to
Srini Devadas
6.884 - Spring 2005
3-terms:
8, 9,10,11
10,11,14,15
10--[D]
1-1-[E]
Y
0
0
1
0
0
1
1
1
1
0, 8
5, 7
7,15
8, 9
8,10
9,11
10,11
10,14
11,15
14,15
label
0
5
7
8
9
10
11
14
15
-000[A]
01-1[B]
-111[C]
10010-0
10-1
1011-10
1-11
111-
1-terms:
Include only those entries where the output
of the function is 1 (label each entry with
it’s decimal equivalent).
X
0
1
1
0
0
0
0
1
1
0-terms:
Express your Boolean function using 0-terms
(product terms with no don’t care care
entries).
W
0
0
0
1
1
1
1
1
1
Z
0
1
1
0
1
0
1
0
1
Label unmerged terms:
these terms are prime!
none!
02/14/05
L05 – Synthesis 13
Prime Term Table
An “X” in the prime term table in row R and column C signifies
that the 0-term corresponding to row R is contained by the
prime corresponding to column C.
Goal:
Goal: select
select the
the
minimum
minimum set
set of
of primes
primes
(columns)
(columns) such
such that
that
there
there isis at
at least
least one
one
“X”
in
every
row.
“X” in every row.
This
This isis the
the classical
classical
minimum
minimum covering
covering
problem.
problem.
0000
0101
0111
1000
1001
1010
1011
1110
1111
A
X
.
.
X
.
.
.
.
.
B
.
X
X
.
.
.
.
.
.
C
.
.
X
.
.
.
.
.
X
D
.
.
.
X
X
X
X
.
.
E
.
.
.
.
.
X
X
X
X
A is essential
B is essential
D is essential
E is essential
Each row with a single X signifies an essential prime term
since any prime implementation will have to include that prime
term because the corresponding 0-term is not contained in any
other prime.
In this example the essential primes “cover” all the 0-terms.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 14
Dominated Columns
Some functions may not have essential primes (Fig. 1), so make
arbitrary selection of first prime in cover, say A (Fig. 2). A
column U of a prime term table dominates V if U contains every
row contained in V. Delete the dominated columns (Fig. 3).
1. Prime table
0000
0001
0101
0111
1000
1010
1110
1111
A
X
X
.
.
.
.
.
.
B
.
X
X
.
.
.
.
.
C
.
.
X
X
.
.
.
.
2. Table with A selected
D
.
.
.
X
.
.
.
X
E
.
.
.
.
.
.
X
X
F
.
.
.
.
.
X
X
.
G
.
.
.
.
X
X
.
.
H
X
.
.
.
X
.
.
.
0101
0111
1000
1010
1110
1111
B
X
.
.
.
.
.
C
X
X
.
.
.
.
D
.
X
.
.
.
X
E
.
.
.
.
X
X
F
.
.
.
X
X
.
G
.
.
X
X
.
.
H
.
.
X
.
.
.
CC dominates
dominates B,
B,
GG dominates
H
dominates H
3. Table with B & H removed
0101
0111
1000
1010
1110
1111
C
X
X
.
.
.
.
D
.
X
.
.
.
X
E
.
.
.
.
X
X
F
.
.
.
X
X
.
G
. C is essential
.
X G is essential
X
.
.
Selecting
Selecting CC and
and GG
shows
shows that
that only
only EE isis
needed
needed to
to complete
complete
the
cover
the cover
This gives a prime cover of {A, C, E, G}. Now backtrack to
our choice of A and explore a different (arbitrary) first
choice; repeat, remembering minimum cover found during
search.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 15
The Quine-McCluskey Method
The input to the procedure is the prime term table T.
1. Delete the dominated primes (columns) in T. Detect essential primes in T by checking to see if
any 0-term is contained by a single prime. Add these essential primes to the selected set. Repeat
until no new essential primes are detected.
2. If the size of the selected set of primes equals or exceeds the best solution thus far return from
this level of recursion. If there are no elements left to be contained, declare the selected set as the
best solution recorded thus far.
3. Heuristically select a prime.
4. Add the chosen prime to the selected set and create a new table by deleting the prime and all 0terms that are contained by this prime in the original table. Set T to this new table and go to Step 1.
Then, create a new table by deleting the chosen prime from the original table without adding it to the
selected set. No 0-terms are deleted from the original table. Set T to new table and go to Step 1.
The good news: this technique generalizes to multi-output functions.
The bad news: the search time grows as 2^(2^N) where N is
the number of inputs. So most modern minimization systems use heuristics to make dramatic reductions in the processing time.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 16
Mapping to target technology
•
Once we’ve minimized the logic equations, the next step
is mapping each equation to the gates in our target gate
library.
Popular approach: DAG covering (K. Keutzer)
6.884 - Spring 2005
02/14/05
L05 – Synthesis 17
Mapping Example
Problem statement: find an “optimal” mapping of this circuit:
Into this library:
Example due to
Kurt Keutzer
6.884 - Spring 2005
02/14/05
L05 – Synthesis 18
DAG Covering
2-input NAND gates + inverters
•
•
•
6.884 - Spring 2005
Represent input netlist in normal form (“subject DAG”).
Represent each library gate in normal form (“primitive
DAGs”).
Goal: find a minimum cost covering of the subject DAG
by the primitive DAGs.
• If the subject and primitive DAGs are trees, there
is an efficient algorithm (dynamic programming) for
finding the optimum cover.
• So: partition subject DAG into a forest of trees
(each gate with fanout > 1 becomes root of a new
tree), generate optimal solutions for each tree,
stitch solutions together.
02/14/05
L05 – Synthesis 19
Primitive DAGs for library gates
8
13
13
10
11
6.884 - Spring 2005
02/14/05
L05 – Synthesis 20
Possible Covers
Hmmm. Seems promising
but is there a systematic
and efficient way to
arrive at the optimal
answer?
6.884 - Spring 2005
02/14/05
L05 – Synthesis 21
Use dynamic programming!
Principle of optimality: Optimal cover for a tree consists of a
best match at the root of the tree plus the optimal cover for
the sub-trees starting at each input of the match.
Best cover for
this match uses
best covers for
P, X & Y
X
Complexity:
Complexity:
To
To determine
determine the
the optimal
optimal cover
cover for
for
aa tree
tree we
we only
only need
need to
to consider
consider aa
best-cost
best-cost match
match at
at the
the root
root of
of the
the
tree
tree (constant
(constant time
time in
in the
the number
number
of
of matched
matched cells),
cells), plus
plus the
the optimal
optimal
cover
cover for
for the
the subtrees
subtrees starting
starting at
at
each
each input
input to
to the
the match
match (constant
(constant
time
time in
in the
the fanin
fanin of
of each
each match)
match) →
→
O(N)
O(N)
6.884 - Spring 2005
Y
Z
P
Best cover for
this match uses
best covers for
P & Z
02/14/05
L05 – Synthesis 22
Optimal tree covering example
6.884 - Spring 2005
02/14/05
L05 – Synthesis 23
Example (II)
6.884 - Spring 2005
02/14/05
L05 – Synthesis 24
Example (III)
Yes Regis, this is our final
answer.
This matches our earlier intuitive
cover, but accomplished
systematically.
Refinements: timing optimization
incorporating load-dependent
delays, optimization for low
power.
6.884 - Spring 2005
02/14/05
L05 – Synthesis 25
Technology-dependent optimizations
• Additional library components: more complex cells may be
slower but will reduce area for logic off the critical path.
• Load buffering: adding buffers/inverters to improve loadinduced delays along the critical path
• Resizing: Resize transistors in gates along critical path
• Retiming: change placement of latches/registers to
minimize overall cycle time
• Increase routability over/through cells: reduce routing
congestion.
You are here!
Verilog
Logic Synthesis
Gate
netlist
• HDL→ logic
• map to target library
• optimize speed, area
6.884 - Spring 2005
Place & route
Mask
• create floorplan blocks
• place cells in block
• route interconnect
• optimize (iterate!)
02/14/05
L05 – Synthesis 26
Download