Week 8

advertisement
ECE 551
Digital System Design & Synthesis
Lecture 08
The Synthesis Process
Constraints and Design Rules
High-Level Synthesis Options
2
3
4
Of course, things are not so simply divided.
5
6
Pre-Synthesis Steps
 Syntax Check
 Makes sure your HDL code follows the syntax rules of the



Standard.
Finds errors like typos, missing semicolons, “begin”
without “end”, assigning to a net in a behavioral block,
etc.
Only a surface-level check
Checks each module in isolation; doesn’t look at how
they fit together
7
Pre-Synthesis Steps
 Elaboration
 “Elaborates” HDL statements
 Unrolls FOR loops
 Computes values of constant functions
 Replaces parameters with their values
 Substitutes macro text
 Evaluates generate conditionals and loops
 Checks to make sure instantiated modules are defined
 Checks inter-module connections for mismatched
input/output connections (i.e. module port width not the
same as connected net/variable width)
8
Pre-Synthesis Steps
 Design Check
 Checks design for issues that may make it





unsynthesizable, but are otherwise legal HDL
Detects multiple drivers to non-tristates
Detects combinational loops
Gives errors or warnings about unsynthesizable
constructs like delays, unsupported operators, etc.
Warns about unconnected or constant-value ports
May give warnings about inferred latches
 Many of these produce warnings rather than errors;
make sure you read the warnings when synthesizing!
9
Synthesis Process
 Inputs
 Functional hardware description in HDL
 List of design constraints and design rules


 Desired clock frequency / maximum delay
 Limits on area, power, capacitance
Technology library (logic cells, wire models, etc.)
User-specified synthesis options/strategies
 Output
 Ideally: A netlist that uses the specified technology

library, produces the same behavior as the functional
description, and meets the design constraints
Reports that summarize the area and timing of the
implementation
10
11
Logic Synthesis Steps
 Translation
 The synthesis tool identifies the behavior of high-level

constructs and replaces them with a structural
representation from a generic technology library.
Examples: “adder”, “multiplier”, “flip-flop”, “latch”
 High-Level Optimizations
 The tool performs optimizations at the Boolean equation


level
The types of optimizations depend on your strategies
Examples: Reducing the number of logic levels,
minimizing the number of Boolean operations,
eliminating redundant computations
12
Logic Synthesis Steps
 Mapping
 The synthesis tool replaces the generic representations

of gates and logic structures with equivalent hardware
representations from the provided technology library
The netlist now consists of a structural representation of
logic cells (Standard Cell) or LUTs/CLBs (FPGA)
 Low-Level Optimizations
 The tool performs optimizations at the logic cell level,

either to reduce delay or reduce area
Examples: Duplicating logic, re-ordering operations to
minimize delay, re-timing registers
13
A Brief Aside on Mapping

People commonly say that when using Structural
Verilog, you know exactly what gates you are
getting.
 Is this true?
 It actually depends on what’s in your Tech Library
 If your library contains an XOR gate, then an XOR

primitive will be mapped to that gate
But what if your Tech Library only contains NAND gates?
Or only Look-up Tables?
14
Why require Constraints & Strategies?
 Synthesis is hard (NP-hard!)
 For a circuit of any useful size, the number of possible



implementations is enormous
It is too computationally intensive to try them all
Need to know when a solution is good enough to stop
We usually give the tool hints on how to proceed
 Often there is no universally “best” solution
 Area vs. delay
 Throughput vs. latency
 Power vs. frequency
 Constraints & strategies allow us to manage tradeoffs to
find the solution that meets our needs
15
Constraint Examples
 Minimize area
module mac(input clk, rst,
input [31:0] in, output [63:0] out);
reg [31:0] constreg;
reg [63:0] mult, add, result;
reg [2:0] count;
assign out = result;
always @(*) mult = constreg * in;
always @(*) add = mult + result;
always @ (posedge clk) begin
if (rst) begin
constreg <= in;
result <= 0;
count <= 0;
end else if (count > 0) begin
result <= add;
count <= count - 1;
end else begin
result <= 0;
count <= 4;
end
end
endmodule
16
Setting Design Constraints

set_max_area 20000
 Sets maximum area to 20,000 cell units

set_max_delay 4 -to all_outputs()
 Sets maximum delay of 4 to any output

set_max_dynamic_power 10mW
 Sets maximum dynamic power to 10 mW

create_clk “clk” –period 10
 Specifies that port clk is a clock with a period of 10ns

create_clk –name “my_clk” –period 12
 Creates a virtual clock called my_clk with a period of
12ns; use with combinational logic
17
Constraint Examples
CLK_PERIOD = 4 (250 MHz)
MAX_AREA = 80000
Arrival:
3.73
Slack:
0.01
Area:
68122
Slack = CLK_PERIOD –
(Arrival + Library Setup Time)
Library Setup Time is approximately
0.25-0.26 ns for these examples
18
Constraint Examples
CLK_PERIOD = 4
MAX_AREA = 65000
Arrival:
Slack:
Area:
3.75
0.00
64758
19
Constraint Examples
CLK_PERIOD = 4
MAX_AREA = 60000
Arrival:
Slack:
Area:
3.75
0.00
63377
20
Constraint Examples
 Maximize speed
module mac(input clk, rst,
input [31:0] in, output [63:0] out);
reg [31:0] constreg;
reg [63:0] mult, add, result;
reg [2:0] count;
assign out = result;
always @(*) mult = constreg * in;
always @(*) add = mult + result;
always @ (posedge clk) begin
if (rst) begin
constreg <= in;
result <= 0;
count <= 0;
end else if (count > 0) begin
result <= add;
count <= count - 1;
end else begin
result <= 0;
count <= 4;
end
end
endmodule
21
Constraint Examples
CLK_PERIOD = 4 (250 MHz)
MAX_AREA = 80000
Arrival:
Slack:
Area:
3.73 (+ 0.26 = 3.99)
0.01
68122
22
Constraint Examples
CLK_PERIOD = 3.6 (278 MHz)
MAX_AREA = 80000
Arrival:
Slack:
Area:
3.46 (+ 0.26 = 3.68)
-0.08
73131
23
Constraint Examples
CLK_PERIOD = 3.7 (270 MHz)
MAX_AREA = 90000
Arrival:
Slack:
Area:
3.45 (+ 0.25 = 3.7)
0.00
75673
24
Optimization Priorities


Design rules have priority over timing goals
Timing goals have priority over area goals
 Design rules have highest priority

To prioritize area constraints:
 use the ignore_tns (total negative slack) option when
you specify the area constraint:
set_max_area -ignore_tns 10000

To change priorities use set_cost_priority
 Example: set_cost_priority -delay

To remove all optimization constraints use
remove_constraint
25
Constraints Default Cost Vector
26
Compiling the Design


Once optimizations specifications are set, the
design is compiled
The compile command
 Logic-level and gate-level synthesis
 Optimizations of the design

The compile_ultra command
 Two-pass high effort compile of the design
 May want to compile normally first to get ballpark figure
(higher effort == longer compilation)
What is the purpose of doing multiple passes?
27
Synthesis Strategies

Even after supplying HDL code, Tech Library, and
Constraints, the designer is still responsible for the
Synthesis Strategy.

Why do we use Strategies?
 The amount of CPU time and memory we devote to

synthesis are still limited resources
The designers may already have a good idea about what
sort of hardware they want
28
Compiling the Design

Useful compile options include:
-map_effort low | medium | high (default is medium)
-area_effort low | medium | high (default same as
map_effort)
-incremental_mapping (may improve already-mapped)
-verify (compares initial and synthesized designs)
-ungroup_all (collapses all levels of design hierarchy)
29
Top-Down Compilation



Use top-down compile strategy used when compile
time or synthesizer memory are not limiters
Synthesizes each design unit separately and uses
top-level constraints
Basic steps are:
 Read in the entire design using analyze/elaborate or:



acs_read_hdl -recurse $TOP_DESIGN
Resolve multiple instances of any design references with
uniquify
Apply attributes and constraints to the top level
Compile the design using compile or compile_ultra
30
Example Top-Down Script
# read in the entire design
analyze -library WORK -format verilog {E.v D.v C.v B.v A.v TOP.v}
elaborate {E.v D.v C.v B.v A.v TOP.v}
current_design TOP
link # links TOP.v to libraries and modules it references
# set design constraints
set_max_area 2000
# resolve multiple references
uniquify
# compile the design
compile
31
Bottom-Up Compile Strategy

The bottom-up compile strategy
 Compile the subdesigns separately and then incorporate them
 Top-level constraints are applied and the design is checked for
violations.



Advantages:
 Compiles large designs more quickly (divide-and-conquer)
 Requires less memory than top-down compile
Disadvantages
 Need to develop local constraints as well as global constraints
 May need to repeat process several times to meet design goals
Might use if memory or CPU time are limited
32
Compile-Once-Don’t-Touch Method

The compile-once-don’t-touch method uses the
set_dont_touch command to preserve the compiled
subdesign
current_design top
characterize U2/U3
current_design C
compile
current_design top
set_dont_touch {U2/U3 U2/U4}
compile

What are advantages and disadvantages?
33
Resolving Multiple References

In a hierarchical design, subdesigns are often
referenced by more than one cell instance
34
Uniquify Method

The uniquify command creates a uniquely named copy of
the design for each instance.
current_design top
uniquify
compile
 Each design optimized separately
 What are advantages and disadvantages?
35
Ungroup Method (“Flattening”)

The ungroup command makes unique copies of the
design and removes levels of the hierarchy
current_design B
ungroup {U3 U4}
current_design top
compile

What are advantages and disadvantages?
36
Benefits of Ungrouping Hierarchy
module logic1(input a, c, e, output reg x);
always @(a, c, e)
x = ((~a|~c) & e) | (a&c);
endmodule
module logic2(input a, b, c, d, output reg y);
always @(a, b, c, d)
y = ((((~a|~c)&b) | ((a|~b)&c))&d) | ((a|~b)&~d);
endmodule
module logic(input a, b, c, d, e, f, output reg z);
wire x, y;
logic1(a, c, e, x);
logic2(a, b, c, d, y);
always @(x, y, f)
z = (~f&x) | (f&y);
endmodule
With Hierarchy
Area: 36.15
Delay: 0.25
Without Hierarchy
Area: 34.15
Delay: 0.25
37
Ungrouping versus Boolean Flattening




Ungrouping is commonly referred to as “Flattening
the Hierarchy”, even by tool vendors
Because of this, many people incorrectly think the
“set_flatten true” option in Synopsys is the same as
“ungroup”
set_flatten true tells Design Vision to flatten the
Boolean equations describing your logic down to a
two-level expression. That is, to create a Sum of
Products expression.
Flattening Boolean equations is a way of reducing
delay at the cost of increased area – we’ll talk
about it more in a later lecture.
38
Dealing with Structured Logic





Sometimes we do not want the synthesis tool to try
to optimize our Boolean equations.
Structured Logic refers to Boolean logic operations
that are structured in a certain way to achieve a
goal, such as reduced delay or fault tolerance.
Examples: Carry-Lookahead Adder, Wallace
Multiplier, duplicated logic
set_structure true (default) – tells the tool it can reorder, factor, or decompose the logic equations
set_structure false – tells the tool to leave the logic
alone
39
Checking your Design

Use the check_design command to verify design
consistency.
 Usually run both before and after compiling a design
 Gives a list of warning and error messages
 Errors will cause compiles to fail
 Warnings indicate a problem with the current design
 Try to fix all of these, since later they can lead to problems
 Use check_design –summary or check_design -no_warnings to
limit the number of warnings given
 Use check_timing to locate potential timing problems
40
Analyzing your Design [1]

There are several commands to analyze your
design
 report_design

 display characteristics of the current design
 operating conditions, wire load model, output delays, etc.
 parameters used by the design
report_area
 displays area information for the current design
 number of nets, ports, cells, references
 area of combinational logic, non-combinational, interconnect,
total
41
Analyzing Your Design [2]
 report_hierarchy


 displays the reference hierarchy of the current design
 tells modules/cells used and the libraries they come from
report_timing
 reports timing information about the design
 default shows one worst case delay path
report_resources
 Lists the resources and datapath blocks used by the current
design
 Can send reports to files
 report_resources > cmult_resources.rpt

Lots of other report commands available
42
Synthesis Scripts

Synthesis scripts provide a convenient method for
performing synthesis multiple times

To run the script, enter the directory which
contains the Verilog code and type:
 dc_shell –tcl_mode –f script.tcl
 dc_shell –tcl_mode –f script.tcl > log.txt &
 This will start the script and store its output to log.txt
43
43
Example Synthesis Script
analyze -library WORK -format verilog {/.register_file_behave.v}
elaborate reg_file_behave -architecture verilog -library WORK
create_clock –name "clk" -period 2 -waveform {0 1} {clk}
set_dont_touch_network [ find clock clk ]
set_max_area 30000
check_design
uniquify
compile -map_effort medium
report_area > area_report.txt
report_timing > timing_report.txt
report_constraint -all_violators > violator_report.txt
44
44
Design Optimization: FIR Filter



Used in signal processing
Passes through some data but not all (filter!)
Example: Remove noise from image/sound


Uses multipliers and adders
Multiply constant “tap” value against time-delayed
input value
y[ n ] 


M
k 0
b k x[ n  k ]
In the Verilog, y is out, bk is taps, and x is data
45
FIR Filter Design
x [n]
z
-1
x [n-1]
z
-1
x [n-2]
z-1
x[n-M]
Filter taps
b0
x
b1
x
+
b2
x
+
bM
x
+
yFIR [n]
46
Design Optimization: FIR Filter

We’ll look at three different approaches to
implementing this filter
 “Initial”
 “Small”
 “Fast”



We’ll revisit the idea of re-architecting algorithms
for better area, latency, and throughput later.
As an exercise, you should take some time on your
own to try to understand exactly what is happening
in each of the following code segments.
Learning to read and understand someone else’s
(confusing) code is an extremely valuable skill
47
Initial Design: Code [1]
module fir_init(clk, rst, in, out);
parameter bitwidth = 8;
parameter ntaps = 4;
parameter logntaps = 2;
input
clk, rst;
input
[bitwidth-1:0] in;
output reg [bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1];
reg [bitwidth-1:0] data [0:ntaps-1];
reg [logntaps:0] count;
integer
i;
48
Initial Design: Code [2]
always @(posedge clk) begin
if (rst) begin
// indicate we need to load all the tap values
count <= 0;
// reset the data and taps
for (i = 0; i < ntaps; i = i + 1) begin: resetloop
data[i] <= 0; taps[i] <= 0;
end
end
else if (count < ntaps) begin
// we need to load the tap values before filtering
for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps
taps[i] <= taps[i-1];
end
// load the new value at tap[0]
taps[0] <= in;
count <= count+1;
end
49
Initial Design: Code [3]
else begin
// ready to do the filtering
// first shift in the new input data value
for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata
data[i] <= data[i-1];
end
// load the new value at data[0]
data[0] <= in;
end // else: !if(count < ntaps)
end // always @ (posedge clk)
// compute the filtered result
always @(*) begin
out = 0;
for (i = 0; i < ntaps; i = i + 1) begin: filterloop
out = out + (data[i] * taps[ntaps-1 - i]);
end
end
endmodule
50
Initial Design: Synthesis


Constraints
 CLK_PERIOD
 INPUT_DELAY
 OUTPUT_DELAY
 MAX_AREA
4
0.2
0.2
8000
Results
 Arrival Time
 Slack
 Area
3.13
.67 (MET)
7335
Should we make our contraints more
aggressive?
51
Initial Design: Schematic
52
Small Design: Code [1]
module fir_area(clk, rst, in, out);
parameter bitwidth = 8;
parameter ntaps = 4;
parameter logntaps = 2;
input
clk, rst;
input
[bitwidth-1:0] in;
output reg [bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1];
reg [bitwidth-1:0] data [0:ntaps-1];
reg [bitwidth-1:0] partial;
reg [logntaps:0] count;
reg [logntaps-1:0] step;
reg ready;
// indicates ready to filter
integer
i;
53
Small Design: Code [2]
always @(posedge clk) begin
if (rst) begin
// indicate we need to load all the tap values
count <= 0; ready <= 0;
// reset the data and taps
for (i = 0; i < ntaps; i = i + 1) begin: resetloop
data[i] <= 0; taps[i] <= 0;
end
end
else if (count < ntaps && ~ready) begin
// we need to load the tap values before filtering
for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps
taps[i] <= taps[i-1];
end
// load the new value at tap[0]
taps[0] <= in;
count <= count+1;
if (count >= ntaps) begin ready <= 1; count <= 0; end
end
54
Small Design: Code [3]
else begin
// ready to do the filtering
// first shift in the new input data value
for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata
data[i] <= data[i-1];
end
// load the new value at data[0]
data[0] <= in;
end // else: !if(count < ntaps)
end // always @ (posedge clk)
55
Small Design: Code [4]
// compute the filtered result
always @(posedge clk) begin
if (rst || ~ready) begin step <= 0; partial <= 0; end
else begin
if (step == 0) begin
out <= partial;
partial <= (data[0] * taps[ntaps-1]);
end
else begin
out <= out;
partial <= partial + (data[step] * taps[ntaps - 1 – step]);
end
if (step < ntaps-1) step <= step + 1;
else step <= 0;
end
end
endmodule
56
Small Design: Synthesis



Constraints
 CLK_PERIOD
 INPUT_DELAY
 OUTPUT_DELAY
 MAX_AREA
4
0.2
0.2
8000
Results
 Arrival Time
 Slack
 Area
2.76 (vs. 3.13)
.92 (MET) (4 clock cycles)
5754 (vs. 7335)
What are the tradeoffs?
57
Small Design: Schematic
58
Fast Design: Code [1]
module fir_fast(clk, rst, in, out);
parameter bitwidth = 8;
parameter ntaps = 4;
parameter logntaps = 2;
input
input
output
clk, rst;
[bitwidth-1:0] in;
[bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1];
reg [bitwidth-1:0] mult [0:ntaps-1];
reg [bitwidth-1:0] partial [0:ntaps-1];
reg [logntaps:0] count;
reg ready;
// indicates ready to filter
integer
i;
assign out = partial[ntaps-1];
59
Fast Design: Code [2]
always @(posedge clk) begin
if (rst) begin
// indicate we need to load all the tap values
count <= 0;
// reset the taps
for (i = 0; i < ntaps; i = i + 1) begin: resetloop
taps[i] <= 0;
end
end
else if (count < ntaps && ~ready) begin
// we need to load the tap values before filtering
for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps
taps[i] <= taps[i-1];
end
// load the new value at tap[0]
taps[0] <= in;
count <= count+1;
end
60
Fast Design: Code [3]
else begin
// taps stay the same
end // else: !if(count < ntaps)
end // always @ (posedge clk)
// compute the filtered result (pipelined)
always @(posedge clk) begin
// get the product of the input with each of the tap values
for (i = 0; i < ntaps; i = i + 1)
mult[i] <= in * taps[i];
// special case at front
partial[0] <= mult[0];
// get the partial sums for the rest
for (i = 1; i < ntaps; i = i + 1)
partial[i] <= partial[i-1] + mult[i];
end
endmodule
61
Fast Design: Synthesis


Constraints
 CLK_PERIOD
 INPUT_DELAY
 OUTPUT_DELAY
 MAX_AREA
4
0.2
0.2
8000
Results
 Arrival Time
 Slack
 Area
1.92 (vs. 3.13)
1.82 (MET) (1 clock cycle!*)
7311 (vs. 7335)
What are the tradeoffs?
62
Fast Design: Schematic
63
Optimization Strategies

Area vs. Delay - Often only really optimize for one
 “Fastest given an area constraint”
 “Smallest given a speed constraint”

Design Compiler Reference Manual has several
pointers on synthesis settings for these goals

In some ways, synthesis is as much an art as it is a
science
Experiment with different options to see how they
interact with each other

64
Design Examples



All using same constraints
No special synthesis options
Can get even more dramatic results by combining:
 Coding style
 Tight constraints
 Synthesis optimization options
65
Some More “Small Design” Results
constraints
results
area
input
delay
output
delay
clock
period
area
slack
compile –area_effort medium
8000
0.2
0.2
4
5797
2.05
compile –area_effort high
5500
0.2
0.2
4
5778
2.05
compile ultra
5500
0.2
0.2
4
5242
1.42
compile ultra
5000
0.2
0.2
4
5242
1.42
compile + compile ultra
5000
0.2
0.2
4
6562
1.78
compile ultra
5500
0.2
0.2
2
5274
0.01
compile ultra
5500
0.2
0.2
1.8
5391
0.00
compile ultra
5500
0.2
0.2
1.7
5519
0.00
compile ultra (rst no delay)
5500
0.2
0.2
1.7
5414
0.00
compile ultra
5500
0.1
0.1
1.7
5636
0.01
compile ultra (rst no delay)
5500
0.1
0.1
1.7
5414
0.00
compile ultra
5500
0.5
0.5
1.7
5923
0.00
compile ultra (rst no delay)
5500
0.5
0.5
1.7
5414
0.00
66
analyze -library WORK -format verilog {fir_area.v}
elaborate fir_area -architecture verilog -library WORK
create_clock -name "clk" -period 4 {clk}
set_dont_touch_network [ find clock clk ]
set_max_area 5000
set NORM_INPUTS [remove_from_collection [all_inputs] "clk rst"]
#set NORM_INPUTS [remove_from_collection [all_inputs] "clk"]
set_input_delay 0.2 -max -clock clk $NORM_INPUTS
set_output_delay 0.2 -max -clock clk [all_outputs]
check_design > check_design.txt
uniquify
#compile -map_effort medium -area_effort medium
compile -map_effort high -area_effort high
compile_ultra
report_area > area_report.txt
report_timing > timing_report.txt
report_constraint -all_violators > violator_report.txt
exit
Script
67
Want more information about any of the Design
Vision commands listed in these lectures?
Log in to a CAE computer and type:
dc_shell
man command_name
68
Download