Week 10 - Michael G. Morrow

advertisement
ECE 551
Digital System Design & Synthesis
Lecture 10
Synthesis Techniques
Lecture 10 Topics



Synthesis Process Revisited
Optimization Stages in Synthesis
Advanced Synthesis Strategies
2
Synthesis


Verilog files aren’t hardware yet!
Need to “synthesize” them
 Tool reads hardware descriptions
 Figures out what hardware to make

Done automatically
 Faster!
 Easier!

Designers still have to understand hardware!
 Avoid pre- vs. post-synthesis discrepancies
 Describe EFFICIENT hardware
3
Useful Documentation

Fairly complete documentation is available for the
Synopsys tools using:
/afs/engr.wisc.edu/apps/eda/synopsys/syn_Y-2006.06SP1/sold

See especially (through Design Compiler link)
 Design Vision User Guide
 Design Compiler User Guide
 Design Compiler Reference Manuals
 HDL Compiler (Presto Verilog) Reference Manual
 HDL Compiler for Verilog Reference Manual

Use as references
4
 HDL Compiler for

Verilog Reference
Manual, pg. 1-5.
HDL Compiler is
called by Design
Compiler and
Design Vision
 Why do we need
to compare
synthesized code
to initial code?
5
 Design Compiler



User Guide, pg. 2-17
Design Vision is GUI
for Design Compiler:
use design_vision
Can also run Design
Compiler directly
using dc_shell
To compile using a
synthesis script use
dc_shell –tcl_mode –f
file_name
6
Synthesis Script Example [1]
# To run, place in the directory with all the Verilog files
# and type: dc_shell -tcl_mode -f script.tcl
#Analyze input files.
analyze -library WORK -format verilog {./prob5.v ./prob1.v ./prob2.v}
#Elaborate the design.
elaborate GF_multiplier_mword -architecture verilog -library WORK
#Sets clock constraint of 2ns with 50% duty cycle on signal "clock".
create_clock -name "clk" -period 2 -waveform {0 1} {clock}
set_dont_touch_network [ find clock clk ]
#Sets the area constraint for the design
set_max_area 50000
7
Synthesis Script Example [2]
#Check and compile the design
check_design > check_design.txt
uniquify
compile -map_effort medium
#Export netlist for post-synthesis simulation into synth_netlist.v
change_names -rule verilog -hierarchy
write -format verilog -hierarchy -output synth_netlist.v
#Generate reports
report_resources > resource_report.txt
report_area > area_report.txt
report_timing > timing_report.txt
report_constraint -all_violators > violator_report.txt
report_register -level_sensitive > latch_report.txt
exit
8
Internal Synthesizer Flow (Synopsys)
HDL Description
Structural
Representation
Syntax Checking
Architectural
Optimization
Technology
Library
Multi-Level Logic
Optimization
Technology
Mapping
Synthesizer
Policy Checking
Elaboration &
Translation
Technology-Based
Implementation
9
Initial Steps




Parsing for Syntax and Semantics Checking
 Gives error messages and warnings to user
 User may modify the HDL description in response
Synthesizer Policy Checking (“Check Design”)
 Check for adherence to allowable language constructs
 Are you using unsupported operators or constructs? Combinational
feedback? Multiple drivers to non-tristate?
This is where you find out you can’t use certain Verilog
constructs
This is synthesizer-dependent
 Example: Advanced DesignWare library allows modulo with any


value; most other tools only allow modulo with powers of 2.
Certain things common to MOST synthesizers
See HDL Compiler for Verilog Reference Manual for constructs
10
Elaboration & Translation



Unrolls loops, substitutes macros & parameters,
computes constant functions, evaluates generate
conditionals
Builds a structural representation of the design
Like a netlist, but includes larger components
 Not just gate-level, may include adders, etc.

Gives additional errors or warnings to the user
 Issues in initial transformation to hardware.
 For example, port sizes do not match

Affects quality achieved by optimization steps
 Structural representation depends on HDL quality
 Poor HDL can prevent optimization
11
Importance of Translation

It is important for the tool to recognize the sort of
logic structures you are trying to describe.

If it sees a 32-bit full adder, the tool has built-in
solutions for optimizing adders
 Ripple-carry, carry-save, carry look-ahead, etc.

If it just sees a Boolean function with 65 inputs, it
has to work a lot harder to achieve the same
results
 Do you think it can invent a CLA on the fly?
12
Implications of Translation

Writing clear, easy to understand code not only
benefits other engineers, but may give you better
synthesis results.

Another reason for standard coding guidelines
 Brush up on the list in “Verilog Styles That Kill”

If you have a decent synthesis tool, it’s usually
better to use Verilog’s built-in arithmetic operators
rather than trying to build them from gates or
Boolean equations
13
Optimization in Synthesis

None of these are guaranteed!
 Most synthesizers will make at least some attempt
 Detect and eliminate redundant logic
 Detect combinational feedback loops
 Exploit don't-care conditions
 Try to detect unused states
 Detect and collapse equivalent states
 Make state assignments if not made already
 Synthesize multi-level logic equations subject to:
 constraints on area and/or speed
 available technology (library)
14
Optimization Process

Optimization modifies the generic netlist resulting
from elaboration and translation.
 Uses cells from the technology library (mapping)
 Attempts to meet all specified constraints

The process is divided into major phases
 All or some selection of the major phases may be


performed during optimization
Phase selection can be controlled by the user
Some optimizations can be disabled (ex: set_structure)
or forced (ex: set_flatten)
15
Optimization Phases

Major Optimization Stages
 Architectural
 Logic-Level
 Gate-Level

Architectural optimization
 High-level optimizations that occur before the design is


mapped to the logic-level
Based on constraints and high-level coding style
After optimization circuit function is represented by a
generic, technology-independent netlist (GTECH)
16
Architectural Optimization
 In Synopsis, optimizations include:
 Sharing common mathematical subexpressions
 Sharing resources
 Selecting DesignWare* implementations
 Replacing the generic representation from Translation
with a pre-built, optimized circuits
 Reordering operators
 Identifying arithmetic expressions for datapath
synthesis
*DesignWare is Synopsys’s library of pre-designed circuit
implementations
17
Architectural Optimization

Examples:
 Replace an adder used as a counter with incrementer




count = count + 1;
Replace adder and separate subtractor with
adder/subtractor if not used simultaneously
if (~sub) z = a + b; else z = a – b;
Performs selection of pre-designed components
(Synopsys DesignWare)
 adders, multipliers, shifters, comparators, muxes, etc.
Need good code for synthesizer to do this
Designer knows more about the project than the
tool does! It can only do so much on its own.
18
Logic/Gate-Level Optimization



Works on the generic netlist created by logic
synthesis
Produces a technology-specific netlist.
In Synopsis, it consists of four stages:
 Mapping
 Delay optimization
 Design rule fixing
 Area optimization

This phase often runs in multiple iterations if
constraints are not met on the first try
19
Logic/Gate-Level Optimization

Mapping

Delay optimization


 Generates a gate level implementation using tech library
 Tries to meet timing and area goals
 Tries to fix delay violations from mapping phase.
 Does not fix design rule violations or meet area
constraints.
Design rule fixing
 Tries to correct design rule violations

 Inserting buffers or resizing existing cells
If necessary, violates optimization constraints
Area optimization
 Tries to meet area constraints, which have lowest priority
20
Combinational Optimization
21
Gate-Level Optimization
22
Boolean Logic-Level Optimizations
Verilog
Description
Technology
Libraries
TRANSLATION
ENGINE
OPTIMIZATION
ENGINE
Two-level
Logic Functions
Optimized
Multi-level Logic
Functions
MAPPING
ENGINE
Technology
Implementation
23
Logic Optimizations

Area
 Number of gates
 Size of gates (# inputs)

Delay
 Number of logic levels
 Size of gates (# inputs)


fewer == smaller
fewer == smaller
fewer == faster
fewer == faster
Note that examples that follow ignore NOT gates
for gate count / levels of circuits
This is because many libraries offer gate cells with
one or more inputs already inverted.
24
Logic Optimizations





Decomposition
Extraction
Factoring
Substitution
Elimination


You don’t have to remember the names of these
But should understand logic optimization
 Different techniques targeting area vs. delay
25
Decomposition


Find common expressions in a single function
Reduce redundancy
 Reduce area (number/size of gates)

May increase delay
 More levels of logic

Define a G(x) cost function to compare expressions
 G(inverter) = 0
 G(basic gate) = #inputs to the gate

 Basic gates: AND, OR, NAND, NOR
Based on the concept that the size of a gate is
proportional to the number of inputs
26
Decomposition Example



F = abc + abd + a’c’d’ + b’c’d’
F = ab(c + d) + c’d’(a’ + b’)
F = ab(c + d) + (c + d)’(ab)’



X = ab
Y=c+d
F = XY + X’Y’
1 gate, 1 level
1 gate, 1 level
3 gates, 2 levels
(5 gates, 3 levels total)
G(Original) = 16 (four 3-input, one 4-input gates)
G(Decomposed) = 10 (five 2-input gates)
27
Extraction



Find common sub-expressions between functions
Like decomposition, but across more than one
function
Reduce redundancy
 Reduce area (number/size of gates)

May increase delay if more logic levels introduced
28
Extraction Example
 F = (a + b)cd + e
 G = (a + b)e’
 H = cde
3 gates, 3 levels
2 gates, 2 levels
1 gate, 1 level




1 gate, 1 level (each)
4 gates, 3 levels
2 gate, 2 levels
2 gate, 2 levels
Common subexp: X = a + b, Y = cd
F = XY + e
G = Xe’
H = Ye
 Before:
 (3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND
 G(original) = 6 + 6 + 2 = 14
 After
 (2) 2-input Ors, (4) 2-input ANDs
 G(extracted) = 4 + 8 = 12
29
Factoring


Traditional two-level logic is sum-of-products
Sometimes better expressed by product-of-sums
 Fewer literals => less area

May increase delay if logic equation not completely
factored (becomes multi-level)
30
Factoring Example

Definitely good:
 F = ac + ad + bc + bd
 F = (a + b)(c + d)

7 gates, 3 levels*
3 gates, 2 levels
Maybe good:
 F = ac + ad + e
 F = a(c + d) + e
3 gates, 2 levels (G=7)
3 gates, 3 levels (G=6)
 This one might improve area...
 But will likely increase delay (tradeoff)
*Assuming 2-input gates
31
Substitution



Similar to Extraction
When one function is a sub-function of another
Reduce area
 Fewer gates

Can increase delay if more logic levels
32
Substitution Example


G=a+b
F=a+b+c
1 gate, 1 level
1 gate, 1 level

F=G+c
2 gate, 2 levels

Before:
 (1) 2-input OR, (1) 3-input OR

After:
 (2) 2-input ORs (better area but increased levels)
With compile_ultra, the sub-expressions do not have to explicitly
match, i.e. a + b would still be identified if F = b + c + a
33
Elimination (Flattening)


Opposite of previous optimizations
Goal is to reduce delay
 Make signals travel though as few logic levels as possible

But will likely increase area
 Gate replication / redundant logic

Can force/disable this step using set_flatten true /
set_flatten false
34
Elimination Example


G=c+d
F = Ga + G' b
1 gate, 1 level
3 gates, 3 levels


G=c+d
F = ac + ad + bc’d’
1 gate, 1 level
4 gates, 2 levels

Before:
 (2) 2-input ORs, (2) 2-input ANDs
After:
 (1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs,

(1) 3-input AND (worse area, but fewer levels)
35
compile_ultra Optimizations


Ultra-high mapping effort, 2-pass Compilation
Automatic hierarchical ungrouping
 Ungroups small modules before mapping
 Ungroups critical path based on delay

Automatic datapath extraction *
 E.g. carry-save adders, sharing/unsharing

Boundary optimization
 Propagates logic
across hierarchical boundaries
(constants, NC inputs/outputs, NOT)

Sequential inversion *
 Sequential elements can have their outputs inverted
36
Datepath Extraction Optimizations

Uses carry-save adders where beneficial
 Carry-propagate adders only when result is needed
37
Datapath Extraction Optimizations

Comparator sharing
 A>B, A=B, A<B use a single subtractor with multiple
outputs




Optimization of parallel constant multipliers
SOP to POS transformation
Operand reordering
Explores trade-offs of common sub-expression
sharing and mutually exclusive resource sharing
38
Sharing and Unsharing

Expression sharing may be overridden later due to
timing
 Z1 <= A + B + C
 Z2 <= A + B + D
 Arrival time is A < B < D < C
39
Sharing and Unsharing

Mutually exclusive operations can share resources
 if(SEL) Z = A + B
 else Z = C + D

When would this kind of sharing be a bad idea?
40
Sequential Inversion
 set compile_seqmap_enable_output_inversion true
 Useful if the available flip-flops do not have the
same asynchronous input (preset or clear) as
required in the design
41
Register Retiming


At the HDL level, determining the optimal
placement of registers is difficult and tedious at
best, or just plain impossible at worst
The register retiming tool moves registers through
the synthesized combinational logic network to
improve timing and/or area
 Equalize delay (i.e. reduce critical path delay by


increasing delay in other paths)
Reduce the number of flip-flops if timing criteria are met
 Usually propagate registers forward
Be aware that this may change the values of some
internal signals compared to pre-synthesis.
42
Register Retiming Example (1)
43
Register Retiming Example (2)
44
DC Topographical Mode

When optimizing for delay, the synthesis engine is
not aware of the net delays, since the place-androute has not been accomplished
 Delays can be back-annotated and synthesis repeated
after place-and-route, until closure is reached

Layout-aware synthesis attempts to get faster
timing closure by predicting the physical design and
using that information in synthesis and
optimization, particularly with respect to delay
 Estimates the placement and routing
 Predicts and uses net capacitances in synthesis and
optimization
45
Further Reading

There are many more commands out there to give
you greater control over the synthesis process if
you want it.

See:
 Synopsys Online Documentation (SOLD)
 Design Compiler man pages
46
Download