Innovus CCOpt Concepts Update
Training
May 2015
Agenda
Flow overview & terminology
Concepts
Graph based CTS
Clock trees & skew groups
Automatic Clock spec creation
Chains
Latency modification
Configuration – Quick start, Tuning
Debugging – Overview, CTD, Insertion delay
2
© 2015 Cadence Design Systems, Inc. All rights reserved.
Flow overview & terminology
3
© 2015 Cadence Design Systems, Inc. All rights reserved.
What is clock tree synthesis?
• Clocks are used to synchronize data communication
– Ideal mode
Equal virtual delay from clock source to sinks
– Global skew balancing
Delay from clock source to sinks equal within some tolerance
– Useful skew
Clock path delays adjusted to assist timing slack optimization.
Strict balancing not required – can be exploited to save clock area &
power.
4
© 2015 Cadence Design Systems, Inc. All rights reserved.
Transition from ideal to propagated mode timing
• Add buffering & size/place clock cells for drive and delay
• Sounds straightforward, but now design timing is in
propagated clock mode
• OCV/AOCV impact of non-common clock path
– TNS degrades
• Clock gate enable timing
– Clock gates capture their enable input early
• Other
– Inter clock timing depends on achievable insertion delays
– Clock generator control logic timing
• Pre-CTS timing often optimistic
• Single CTS problem typically destroys entire design timing
• CTS & design routing are moments of truth
5
© 2015 Cadence Design Systems, Inc. All rights reserved.
User command flow
Old flow
CTS: Global skew
balancing only, no
optimization
place_opt_design
• CTS is not just
buffering nets
• Transition from
ideal mode to
propagated mode
timing
ccopt_design -cts
ccopt_design
optDesign -postCTS
optDesign –postCTS -hold
routeDesign
Post-route CCOpt
(limited access,
DRV and skew
fixing)
6
ccopt_pro
optDesign -postRoute
optDesign –postRoute -hold
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus flow
Clock Concurrent
Optimization
CTS + optional useful
skew + post-CTS opt
New skew controls
• setOptMode –usefulSkew true|false
– Global switch, overrides those below
• setOptMode –usefulSkewPreCTS true|false
– Enables/disables pre-cts useful skew in place_opt_design
• setOptMode –usefulSkewCTS true|false
– Enables/disables CCOpt useful skew
• setOptMode –usefulSkewPostCTS true|false
– Enables/disables post-cts useful skew in ccopt_design
or optDesign –postCTS
• setOptMode –usefulSkewPostRoute true|false
– Enables/disables useful skew in optDesign -postRoute
• Changing the global switch does not change individual
switches.
• Normally configured by flow settings - see next slides.
7
© 2015 Cadence Design Systems, Inc. All rights reserved.
CCOpt effort level
set_ccopt_effort –high|-medium|-low
ccopt_design
• High
– Enables CCOpt useful skew (setOptMode -usefulSkewCTS)
– Configures private settings for high effort
• Medium
– Enables CCOpt useful skew
– Configures private settings for CCOpt medium effort - faster TAT
• Low
– Disables CCOpt useful skew – uses CTS global skew balancing
– Otherwise same as medium.
• The set_ccopt_effort command sets setOptMode –
usefulSkew* settings and some private settings. No other
internal state.
8
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus flows and CCOpt
setDesignMode –flowEffort extreme|standard*
ccopt_design (without –cts)
• Extreme flow
– Configures CCOpt high effort
– No need to do “set_ccopt_effort –high”
– Most of this presentation will discuss CTS & CCOpt useful skew
• Standard flow
– Configures CCOpt low effort
– No need to do “set_ccopt_effort –low”
• Post-CTS & post-route useful skew use CCOpt settings
• * Standard flow is default flow
9
© 2015 Cadence Design Systems, Inc. All rights reserved.
CCOpt high effort flow
Trial CTS
Pre-skew
opt
Cluster (DRV buffering), virtual
delay balance, switch to
propagated clocks
Global opt
Area reclaim
DRV opt
CCOpt
useful
skew
Prepare clock tree for skewing
WNS opt transforms
WNS reg2reg opt transforms
refinePlace + opt
Skew sinks
TNS opt transforms
refinePlace + opt
refinePlace + opt
Update clock tree
Final slack polishing
Implement
+ route
refinePlace + opt
Implementation + NanoRoute
Clock nets are routed
with NanoRoute here
10
Final WNS opt transforms
© 2015 Cadence Design Systems, Inc. All rights reserved.
Final slack polishing
Post-CTS
useful
skew
included
Post-CTS
Opt
CCOpt low effort flow
CTS
Post-CTS optimization
with useful skew
Similar to ccopt_design -cts
Essentially optDesign -postCTS
• Same ccopt_design command invocation as for high effort
• Post-CTS useful skew respects CCOpt settings (cells, DRV)
11
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Graph based CTS
12
© 2015 Cadence Design Systems, Inc. All rights reserved.
Graph based CTS
Example circuit
clk2
ck1
Mode A
sel_clk = 0
sel_div = 1
sel_clk
1
2
sel_div
13
0
© 2015 Cadence Design Systems, Inc. All rights reserved.
0
1
Mode B
sel_clk = 1
sel_div = 0
Graph based CTS
Physical buffering
clk2
ck1
sel_clk
0
1
2
sel_div
0
1
• First perform physical buffering for DRV - Clustering
– Plus other steps to reduce insertion delay and power
• Don’t worry about skew
14
© 2015 Cadence Design Systems, Inc. All rights reserved.
Graph based CTS
Constraint analysis
ck1
Mode A
sel_clk = 0
sel_div = 1
min
min
min
sel_clk
0
1
min
2
min
sel_div
0
1
min
clk2
Mode B
sel_clk = 1
sel_div = 0
min
min
• Physically buffered graph determines minimum possible delays for
each clock tree graph fragment
15
© 2015 Cadence Design Systems, Inc. All rights reserved.
Graph based CTS
Constraint analysis
ck1
Mode A
sel_clk = 0
sel_div = 1
d1
d7
d2
sel_clk
0
clk2
1
d5
2
d3
sel_div
0
1
d6
d4
• Represent each fragment delay as a variable in a set of
simultaneous equations
• Solving equations gives delays to achieve ideal balancing
16
© 2015 Cadence Design Systems, Inc. All rights reserved.
Mode B
sel_clk = 1
sel_div = 0
d8
Graph based CTS
Virtual delay balance then implement
ck1
Mode A
sel_clk = 0
sel_div = 1
1
0.1
0.35
sel_clk
0
1
0.2
2
0.15
sel_div
0
1
0.2
clk2
Mode B
sel_clk = 1
sel_div = 0
0.5
0.25
• Implement using real buffers and wire
• Detailed refinement step – add/remove buffer, cell sizing, cell
placement
• CCOpt useful skew & optimization takes place before implementation
17
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Clock trees & skew groups
18
© 2015 Cadence Design Systems, Inc. All rights reserved.
CTS Constraints
• Physical constraints
– Max transition (slew) time, max capacitance, max wire length
– Non-default routing rules (NDR)
– Physical constraints come from the user and cell library
• Balancing constraints
– Delay along paths from sources to sinks
– Traverse generators, clock gates – “nodes”
– Sources, sinks and paths derived from SDC timing constraints
– Multi-mode – Multiple constraint modes
19
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock trees & skew groups
• Clock trees (create_ccopt_clock_tree)
– Specify subset of circuit which CTS can operate on
– Graph in all except simplest of cases
– Physical constraints
– Max transition, max capacitance, max wire length
– Non-default routing rules (NDR)
– Buffer/inverter cell sets
– Single physical graph even with multiple constraint modes
• Skew groups (create_ccopt_skew_group)
– Balancing constraints – skew, insertion delay
– CTS : Global skew balancing
– CCOpt : Initial balancing
– Each skew group is superimposed on the clock tree graph
– Any pins in a clock tree can be sources and sinks
20
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock trees & skew groups
Clock tree graph with skew groups
ck1
skew
group
2
ck2
G
skew
group
1
G
skew
group
1&2
21
...
© 2015 Cadence Design Systems, Inc. All rights reserved.
...
...
Clock trees & skew groups
Clock tree graph
ck1
ck2
Clock
graph
G
G
...
22
© 2015 Cadence Design Systems, Inc. All rights reserved.
...
...
Everything
else is
datapath
Clock trees & skew groups
Distinction
• Important to remember three separate entities:
1) Clock
- clocks for timing analysis
2) Clock tree
- determines physical subset of circuit for CTS
3) Skew group
- determines balancing / initial balancing
• Everyone misuses the terminology in conversation
23
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Automatic clock spec
creation
24
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts – Automatic spec creation
Overview
V, DEF,
libraries,
…
func1.sdc
Automatically done by
ccopt_design if not done by
user
func2.sdc
Innovus
session
test.sdc
25
create_ccopt_clock_tree_spec
clock trees,
skew groups
and property
settings
(optional Tcl
output with –file
<filename>)
1. Analyses multi-mode timing graph
2. Creates clock trees and skew groups
and configures them
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree specification
Key SDC commands
SDC command
Typical clock related rurpose
create_clock
Define a clock
create_generated_clock
Define a generated clock
set_case_analysis
Specify clock mux direction, typically
mode specific (e.g. func, test, shift)
set_clock_latency
Target insertion delay, macro pin
internal clock delays, pre-cts skew
set_clock_sense –stop_propagation
Stop clock propagating beyond a pin
set_disable_timing
Disables arcs in the clock timing graph
• Not direct translation from SDC or timing constraints
– Analysis of timing graph to determine which clocks propagate where
– Clock trees created to cover subset of circuit CTS need to operate in
– Skew groups created to represent balancing constraints
26
© 2015 Cadence Design Systems, Inc. All rights reserved.
Automatic spec creation
Clock trees and skew groups
• Clock trees
– Defined for primary clocks (input ports)
– Defined at generator (e.g. divider flop) outputs
– Global ignore pins at edge of clock tree graph
– Example – clock path to flop D input
– You can often ignore the precise clock tree definitions
• Skew groups
– A skew_group is created per SDC clock per constraint mode
– 1-1 mapping between skew groups and SDC clocks per mode
– Examples: test_clk/shift , clk1500m/func
– Generated SDC clocks map to non-constraining skew group
– Sinks of generated clock are balanced with sinks of master
• Example coming up
27
© 2015 Cadence Design Systems, Inc. All rights reserved.
Automatic spec creation
Single mode example – circuit
create_clock
[get_ports
{ck1}]
ck1
create_clock
[get_ports
{ck2}]
ck2
create_generated_clock
-name gck1
-divide_by 2
[get_pins {d1/Q}]
-source
[get_pins {d1/CK}]
-master_clock
[get_clocks {ck1}]
w1
d1
d2
G
gck1
w4
w3
w2
m3
f1
28
create_generated_clock
-name gck2
-divide_by 2
[get_pins {m3/Y}]
-source
[get_pins {d2/CK}]
f2
© 2015 Cadence Design Systems, Inc. All rights reserved.
f3
gck2
f4
Automatic spec creation
Single mode example
Clock tree ck1
Skew group ck1
ck1
Clock tree ck2
Skew group ck2
ck2
generated clock tree
ck2_generator_for_gck2<1>
d1/CK ignored in
skew group ck2
w1
d1
d2
G
gck1
generated
clock tree
gck1,
reporting
only skew
group
w4
w3
w2
m3
f1
f2
f3
gck2
no clock
tree,
reporting
only skew
group
f4
Note: Skew groups and clock trees often have the same name.
29
© 2015 Cadence Design Systems, Inc. All rights reserved.
Automatic spec creation
Single mode example – skew groups
Skew group ck2
Skew group ck1
ck1
ck2
d1/CK ignored in
skew group ck2
w1
d1
d2
G
gck1
w2
f1
30
f2
© 2015 Cadence Design Systems, Inc. All rights reserved.
w4
w3
gck2
f3
f4
Automatic spec creation
Multi mode example
ck
mode0.sdc
create_clock [get_ports {ck}]
create_generated_clock -name gck -divide_by 2
[get_pins {d1/Q}] -source [get_pins {d1/CK}]
set_case_analysis 0 [get_ports {sel_div}]
d1
sel_div
0
1
mode1.sdc
create_clock [get_ports {ck}]
create_generated_clock -name gck -divide_by 2
[get_pins {d1/Q}] -source [get_pins {d1/CK}]
set_case_analysis 1 [get_ports {sel_div}]
Skew group ck/mode0
Skew group ck/mode1
31
modify_ccopt_skew_group ck/mode0
-add_ignore_pins mux/I1
modify_ccopt_skew_group ck/mode1
-add_ignore_pins mux/I0
•
Two skew groups with source at ‘ck’ input – one per clock per mode
•
One ignored at mux ‘0’ input and other ignored at mux ‘1’ input
•
Paths through the two sides of the mux are not balanced with one
another
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts – Chains
32
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Equal period between register stages
T
clock
T
33
© 2015 Cadence Design Systems, Inc. All rights reserved.
T
T
Concepts - Chains
Reallocate spare time from adjacent stages
T
clock
T-d1
T+d1+d2
T-d2
d1
d2
34
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Reallocate time from further away stages
T
clock
T-d1
T
T+d1+d2
T
T-d2
d1
d2
35
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Where does it stop?
T
clock
??
??
T-d1
T
T+d1+d2
T
T-d2
d1
d2
36
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
I/O Chain
T
clock
37
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Looping chain
T
clock
38
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Clock gate, clock divider
Scheduling the clock gate with
more clock delay might impact the
flop under the clock gate
T
clock
G
• Also variant with loop through a single flop and clock gate
39
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Chains
Macro
T
clock
40
© 2015 Cadence Design Systems, Inc. All rights reserved.
Internal timing path not required.
Sufficient for capturing data input
and launching data output to have
same related clock pin.
Concepts - Worst chain analysis
What you want to see
Worst chain (Setup):
====================
Equal slacks
,-o cell:uls/flopA
In the log file:
| |
location=(910.335,1230.822)
slew=(launch:
28ps, report
capture:just
28ps)
• Find
worst chain
after
| |
slack -68ps pin:.../q -> pin:.../d (distance: 453.123um)
last occurrence of “After useful skew”
| |
delays=(launch: 290ps, datapath: 556ps, capture: 64ps)
• Start at “*WNS*” and look at the
`-o cell:uls/flopB
|
pin:.../clk @+652ps constraint=(-50ps,+589ps)
stages inchosen=(-0ps,+0ps)
the chain before and after it
| |
pin:.../clk @+878ps constraint=(-226ps,+379ps) chosen=(-0ps,+2ps)
|
|
|
location=(862.335,1250.822) slew=(launch: 28ps, capture: 28ps)
*WNS* -68ps pin:.../q -> pin:.../d (distance: 1232.345um)
delays=(launch: 64ps, datapath: 782ps, capture: 290ps)
o cell:uls/flopC
|
|
|
|
pin:.../clk @+867ps constraint=(-226ps,+381ps) chosen=(-0ps,+3ps)
location=(910.335,1230.822) slew=(launch: 28ps, capture: 28ps)
slack -68ps pin:.../q -> pin:.../d (distance: 372.312um)
delays=(launch: 290ps, datapath: 556ps, capture: 64ps)
...
41
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Worst chain analysis
Explanation of report
Constraint window – insertion delay change which is possible, limited
by min DRV buffering delay, auto max insertion delay limit and
optionally user skew constraints
Delay under fragment
parent node
(not ‘global’ insertion
delay)
Chosen window – skew scheduling
wants the sink within this range –
delays are relative to current sink
insertion delay
,-o cell:uls/flopA
| |
pin:.../clk @+878ps constraint=(-226ps,+379ps) chosen=(-0ps,+2ps)
| |
| |
location=(910.335,1230.822) slew=(launch: 28ps, capture: 28ps)
slack -68ps pin:.../q -> pin:.../d (distance: 453.123um)
| |
delays=(launch: 290ps, datapath: 556ps, capture: 64ps)
`-o cell:uls/flopB
Slack
42
Pin location
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock pin slew
Concepts - Worst chain analysis
Explanation of report
,-o cell:uls/flopA
| |
pin:.../clk @+878ps constraint=(-226ps,+379ps) chosen=(-0ps,+2ps)
| |
| |
| |
location=(910.335,1230.822) slew=(launch: 28ps, capture: 28ps)
slack -68ps pin:.../q -> pin:.../d (distance: 453.123um)
delays=(launch: 290ps, datapath: 556ps, capture: 64ps)
`-o cell:uls/flopB
Total path length
Clock tree launch
path delay
Data path delay
Clock tree capture
path delay
• Clock launch and capture CTE clock delays
– Source latency included – can be negative
– Launch and capture clocks can be different – are shown in real log output
43
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Worst chain analysis
Sink at bottom of constraint window
,-o cell:uls/flopA
| |
pin:.../clk @+600ps constraint=(-226ps,+200ps) chosen=(-0ps,+2ps)
| |
| |
| |
location=(910.335,1230.822) slew=(launch: 28ps, capture: 28ps)
*WNS* -68ps pin:.../q -> pin:.../d (distance: 453.123um)
delays=(launch: 64ps, datapath: 556ps, capture: 290ps)
`-o cell:uls/flopB
|
|
|
pin:.../clk @+800ps constraint=(-50ps,0ps) chosen=(-0ps,+0ps)
location=(862.335,1250.822) slew=(launch: 28ps, capture: 28ps)
slack -45ps pin:.../q -> pin:.../d (distance: 1232.345um)
next path has better slack
• flopB can not be scheduled any later
– Usual cause is auto insertion delay limit
44
© 2015 Cadence Design Systems, Inc. All rights reserved.
0ps: sink at the
bottom of the
constraint window
Concepts - Worst chain analysis
Sink at bottom of constraint window – diagram
flopB:
constraint=(-50ps,0ps)
clock path
with buffering
& virtual delay
600ps
800ps
A
800-50=750ps
minimum delay to
B with current
clustering
800ps is max
insertion delay
B
45
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Worst chain analysis -50ps: desired chosen
Sink at top of constraint window
|
|
window above the
constraint window
slack -20ps pin:.../q -> pin:.../d (distance: 1232.345um)
delays=(launch: 64ps, datapath: 782ps, capture: 290ps)
,-o cell:uls/flopA
| |
pin:.../clk @+400ps constraint=(+0,+379ps) chosen=(-50ps,-0ps)
| |
| |
| |
location=(910.335,1230.822) slew=(launch: 28ps, capture: 28ps)
*WNS* -68ps pin:.../q -> pin:.../d (distance: 453.123um)
delays=(launch: 290ps, datapath: 556ps, capture: 64ps)
`-o cell:uls/flopB
previous path has better slack
0ps: sink can not be earlier
• flopA can not be scheduled any earlier
46
© 2015 Cadence Design Systems, Inc. All rights reserved.
Concepts - Latency Modification
47
© 2015 Cadence Design Systems, Inc. All rights reserved.
Latency Modification
• Why is latency modification required?
– Compensate for difference between pre-CTS clock latency estimates
and the latency (insertion delay) CTS can actually achieve
– Typically at a block level network latencies are zero
48
© 2015 Cadence Design Systems, Inc. All rights reserved.
Latency Modification
Pre-CTS
0
set_clock_latency –source 1 [get_clocks ck]
clock root pin
set_clock_latency 3 [get_clocks ck]
3ns
4
Time (ns)
1ns
average clock
arrival time 4ns
49
© 2015 Cadence Design Systems, Inc. All rights reserved.
Latency Modification
CTS without source latency update
0
set_clock_latency –source 1 [get_clocks ck]
Time (ns)
1ns
clock root pin
set_clock_latency 3 [get_clocks ck]
3ns
4.5
4
CTS insertion delay of
3.5ns
optimistic setup slack
50
© 2015 Cadence Design Systems, Inc. All rights reserved.
pessimistic setup
slack
Latency Modification
CTS with source latency update
0
set_clock_latency –source 1 [get_clocks ck]
0.5ns
Time (ns)
1ns
set_clock_latency –source 0.5 [get_pins ck]
clock root pin
set_clock_latency 3 [get_clocks ck]
3ns
4
CTS insertion delay of
3.5ns
average clock
arrival time 4ns
correct setup slack
51
© 2015 Cadence Design Systems, Inc. All rights reserved.
correct setup slack
Latency Modification
Alternative version with zero initial latency
Time (ns) -3.5 0
set_clock_latency –source 0 [get_clocks ck]
-3.5ns
0ns
set_clock_latency –source -3.5 [get_pins ck]
clock root pin
set_clock_latency 0 [get_clocks ck]
0ns
0
CTS insertion delay of
3.5ns
average clock
arrival time 0ns
correct setup slack
52
© 2015 Cadence Design Systems, Inc. All rights reserved.
correct setup slack
Latency Modification
Common questions
• Why is it called update_io_latency?
– Good question
– IO latencies are not updated, only real clock pin source latencies.
• I switched it off as the customer does not understand it!
– Results in incorrect IO timing during CCOpt optimization
• Do I need this for plain CTS or just CCOpt?
– Both
– For CTS only it is possible to do the update after CTS
– For CCOpt it is done after balancing skew groups but before CCOpt
useful skew starts
53
© 2015 Cadence Design Systems, Inc. All rights reserved.
Latency Modification
Common questions
• What about multiple clocks?
– Source adjustment is performed per clock root pin
• What about virtual clocks?
– Virtual clocks do not need latency modification
• What happens at top level CTS?
– Those negative source latencies represent an early clock skew to be
synthesized at the top level
– .lib models – use per pin insertion delay offset
– ILM – CTS/CCOpt will see inside the ILM
• What about full chip top level design?
– Turn off latency modification
– set_ccopt_property update_io_latency false
– May need to set specific CTS target insertion delay values
54
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration - Quick Start
55
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration – General setup
Pre-CTS DB
Avoid
set_propagated_clock!
Load post-CTS timing constraints
Not needed for ccopt -cts
Make preferred setOptMode settings
Configure NDR
create_route_type –name CLK_NDR ...
set_route_type –net_type {leaf, trunk, top}
CLK_NDR
Configure Cells
set_ccopt_property cts_buffer_cells ...
set_ccopt_property cts_inverter_cells ...
set_ccopt_property cts_gating_cells ...
Configure targets
set_ccopt_property target_max_trans ...
set_ccopt_property target_skew ...
Create spec from SDC
[create_ccopt_clock_tree_spec]
ccopt_design [-cts]
56
© 2015 Cadence Design Systems, Inc. All rights reserved.
Skew target
recommended for
CTS only and for
low effort CCOpt
Spec creation
automatically run
if not done
Configuration - Timing
• Generally, use post-CTS timing configuration settings
– Uncertainties, de-rates, (A)OCV, active views
• CCOpt will set all ideal clocks (with clock trees) to
propagated mode, and adjust source latencies
– CCOpt will not adjust latency of propagated mode clocks
– To disable latency modification for top level chip:
set_ccopt_property update_io_latency false
57
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration – Clock net types
top
Top net type applies to nets with transitive fanout sink
count greater than user set threshold.
trunk
set_ccopt_property routing_top_min_fanout –
clock_tree <name>
leaf
By default each sink counts as ‘1’ but user can override:
set_ccopt_property routing_top_fanout_count <N> -pin <pin>
58
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration - NDRs
• Route type object used to specify
– Clock NDR
– Layer Range – top/bottom preferred layer
– Shielding – net + bottom preferred layer
• Create a route type
create_route_type [-help] -name <string> [[-non_default_rule
<ndr_name>] [-shield_net <net_name>] [-bottom_shield_layer <layer>] [top_preferred_layer <layer>] [-bottom_preferred_layer <layer>] [preferred_routing_layer_effort {low medium high}]] | [-table <string> [table_length_type {total source_to_sink}]]
• Assign it to a clock net type
set_ccopt_property route_type –net_type {trunk leaf top} [-clock_tree
<name>]
59
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration - Cells
set_ccopt_property -name inverter_cells <list | lib_cell> [-clock_tree <name>]
also buffer_cells, clock_gating_cells, logic_cells, delay_cells
• Should specify all buffers, inverters, clock gating cells
To use only inverters: set_ccopt_property use_inverters true
• Include different power domains/library sets in single list
60
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration – Clock spec
• Customize clock network boundary sinks before tracing
set_ccopt_property sink_type {stop, ignore…} -pin <name>
optional
• CCOpt will automatically derive clock trees, skew groups from SDCs
create_ccopt_clock_tree_spec [-file <filename>]
if using –file, must source <filename> to take effect
• Alter skew groups after tracing
create_ccopt_skew_group -name <name>…
modify_ccopt_skew_group –skew_group <name> {-add_sinks | remove_sinks | -add_ignore_pins | -remove_ignore_pins}
create_ccopt_clock_tree…
delete_ccopt_skew_group…
delete_ccopt_clock_tree…
61
© 2015 Cadence Design Systems, Inc. All rights reserved.
Editing spec file not
recommended.
optional
Configuration – Common settings
Offsetting a pin
• Pin insertion delay – offsetting a pin
set_ccopt_property insertion_delay 1.4ns –pin <pin name>
– Frequently used for .lib partition or macro clock input
– Value is delay CTS should assume exists inside the partition/macro
– Positive values pull the sink earlier in the tree
– Negative values push the sink later in the tree
• create_ccopt_clock_tree_spec will translate SDC
set_clock_latency –pin to these property settings
• For a black box (no .lib model) also need to specify input pin cap
set_ccopt_property –capacitance_override …
62
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration – Common settings
Limiting useful skew range for CCOpt useful skew
• Limit maximum insertion delay increase from useful skew
– Initial trial CTS has insertion delay X
– Late skewing max ID limited to X * auto_limit_insertion_delay_factor
– Does not limit early skewing
– set_ccopt_property auto_limit_insertion_delay_factor 1.3
– Default is 1.5
• Place hard skew limit on all skew groups
Why?
- Just don’t want it!
- Cross corner
scaling and timing
correlation concerns
– Regardless of impact on setup timing
foreach sg [get_ccopt_skew_groups *] {
set_ccopt_property target_skew 0.400ns –skew_group $sg
# all = constrain all of ccopt_design internal flow
set_ccopt_property constrains all $sg
}
63
© 2015 Cadence Design Systems, Inc. All rights reserved.
Configuration – Tips
• Historically multiple schemes for configuration
– Due to migration from standalone ‘scripted’ CCOpt and FE-CTS
– setCTSMode
– set_ccopt_mode / setCCOptMode
– set_ccopt_property
• set_ccopt_property is strongly preferred
• Help
ENC> help *ccopt*
ENC> get_ccopt_property –help *
64
© 2015 Cadence Design Systems, Inc. All rights reserved.
Get a list of CCOpt
commands
Get help for all CCOpt
properties
Configuration - Tuning
65
© 2015 Cadence Design Systems, Inc. All rights reserved.
Tuning – Transition targets & Cells
• Transition target
– Trade off between insertion delay, area/power
– Experiment with ccopt_design –cts
– (Do not use cluster or trial runs for this)
– May still need some margin compared to sign off target
– Used to say 20%...
– ccopt_pro beta feature (in next presentation)
• Cell selection
– Largest cells are bad for power (and EM)
– Really small cells often have poor cross-corner scaling
– Do not remove all smaller cells – balancing will use a lot of area
– Use inverters – better for insertion delay and power
– R&D working on improvements here
66
© 2015 Cadence Design Systems, Inc. All rights reserved.
Tuning – NDRs
• Top nets
– Consider extra fast top net rules to reduce delay at top 1/3 of tree
– Higher layers, more spacing, wider
– Shield to reduce SI problems
• Trunk nets
– High layers if not too blocked by power grid
– 2X width for lower resistance
– Shield to reduce SI problems
– 2X spacing to reduce cap to shield
• Leaf nets
– 2X width for lower resistance
– Middle layers
– Extra spacing desirable, shield probably uses too much resource
• All clock nets
– Ensure bar shaped vias in use all the way down to M1 pins
– Avoid clock nets on double pattern layers
67
© 2015 Cadence Design Systems, Inc. All rights reserved.
Tuning – Early sinks
• Important to tell CCOpt about very early sinks & gates
– Do not expect ccopt_design to automatically do this
– Better starting point means better optimization and less run-time
• Macro clock input pins
– specify pin insertion delay
Insertion delay
set_ccopt_property insertion_delay X –pin <macro clock pin name>
X
68
© 2015 Cadence Design Systems, Inc. All rights reserved.
Tuning – Early sinks
• Very early architectural clock gates
– Add skew group to balance registers controlling very early
architectural clock gate with the clock gate
create_ccopt_skew_group
–name balanceFlopWithGateSG
–exclusive_sinks {flop4/clk gate/clkin}
–sources <clock root pin>
G
69
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Overview
70
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Get CTS right before optimizing
Cluster –> (Trial) –> Full
• Cluster
– DRV buffering only
– Check max insertion delay,
debug power management
and floorplan issues
• Full
– Full CTS or CCOpt run
• Get CTS right before
optimizing timing
• (Trial)
= cluster + virtual delay balancing
– Check balancing constraints setup
– Check initial balancing before optimization
set_ccopt_property balance_mode cluster | trial | full
ccopt_design or ccopt_design –cts
71
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Checks, verbose log, reports
• Checking configuration
ccopt_check_prerequisites {private}
• Additional verbose log information
– Often of use to PE and R&D, less so for non-expert users
ccopt_internal_messages –on {private}
• Reporting clock tree area and stats
report_ccopt_clock_trees ...
• Reporting skew and skew group paths
report_ccopt_skew_groups
- through ...
- histograms ...
- delay_corners ... – skew_groups ...
72
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Log file summary
Before summary
----------------------------------------------------------------------------------------------Label
reg2reg WNS
reg2reg TNS
All WNS
All TNS
Runtime
----------------------------------------------------------------------------------------------Start of Optimizer WNS Pass 0
-0.268ns
-13.039ns
-1.122ns
-22.438ns
594s
End of Optimizer WNS Pass 0
-0.268ns
-13.039ns
-1.122ns
-22.438ns
595s
Before full cluster
-0.268ns
-13.039ns
-1.122ns
-22.438ns
596s
After full cluster
-0.507ns
-16.616ns
-0.979ns
-24.335ns
666s
Start of Optimizer WNS Pass 1
-0.507ns
-16.616ns
-0.979ns
-24.335ns
670s
Before useful skew
-0.453ns
-16.513ns
-0.979ns
-24.231ns
671s
Before useful skew
-0.163ns
-15.625ns
-0.979ns
-23.344ns
686s
After useful skew
-0.122ns
-13.217ns
-0.979ns
-20.935ns
724s
. . . .
Before useful skew
-0.037ns
-0.176ns
-0.971ns
-7.795ns
900s
After useful skew
-0.037ns
-0.173ns
-0.971ns
-7.789ns
929s
End of Optimizer WNS Pass 1
-0.037ns
-0.173ns
-0.971ns
-7.789ns
930s
Before full cluster
-0.037ns
-0.173ns
-0.971ns
-7.789ns
930s
After full cluster
-0.067ns
-0.667ns
-0.922ns
-7.496ns
1001s
Start of Optimizer WNS Pass 2
-0.067ns
-0.667ns
-0.922ns
-7.496ns
1004s
Before useful skew
-0.067ns
-0.667ns
-0.922ns
-7.496ns
1005s
After useful skew
-0.021ns
-0.272ns
-0.922ns
-7.100ns
1019s
Before useful skew
-0.021ns
-0.272ns
-0.922ns
-7.100ns
1020s
After useful skew
-0.008ns
-0.016ns
-0.922ns
-6.830ns
1057s
Before useful skew
-0.008ns
-0.016ns
-0.922ns
-6.830ns
1057s
After useful skew
-0.002ns
-0.005ns
-0.922ns
-6.819ns
1100s
Before useful skew
-0.002ns
-0.005ns
-0.922ns
-6.819ns
1101s
After useful skew
-0.001ns
-0.002ns
-0.922ns
-6.816ns
1129s
Before useful skew
-0.001ns
-0.002ns
-0.922ns
-6.816ns
1129s
After useful skew
0.004ns
0.000ns
-0.922ns
-6.814ns
1158s
End of Optimizer WNS Pass 2
0.004ns
0.000ns
-0.922ns
-6.814ns
1159s
Before mini cluster
0.004ns
0.000ns
-0.922ns
-6.814ns
1159s
After mini cluster
0.004ns
0.000ns
-0.922ns
-6.812ns
1215s
Start of Optimizer WNS Pass 3
0.004ns
0.000ns
-0.922ns
-6.812ns
1224s
End of Optimizer WNS Pass 3
0.004ns
0.000ns
-0.922ns
-6.812ns
1225s
Before calculating windows
0.004ns
0.000ns
-0.922ns
-6.812ns
1230s
Before implementation
0.004ns
0.000ns
-0.922ns
-6.812ns
1230s
Before post conditioning
-0.001ns
-0.009ns
-1.076ns
-7.814ns
1389s
After implementation
-0.001ns
-0.009ns
-1.076ns
-7.814ns
1423s
Start of Optimizer WNS Pass 0
-0.001ns
-0.009ns
-1.076ns
-7.814ns
1435s
End of Optimizer WNS Pass 0
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1436s
Start of Optimizer WNS Pass 1
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1441s
End of Optimizer WNS Pass 1
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1442s
Start of Optimizer WNS Pass 2
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1448s
End of Optimizer WNS Pass 2
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1449s
Start of Optimizer TNS Pass
-0.001ns
-0.009ns
-1.076ns
-7.811ns
1468s
End of Optimizer TNS Pass
-0.001ns
-0.009ns
-1.076ns
-7.791ns
1470s
Start of Optimizer WNS Pass 0
-0.001ns
-0.001ns
-1.076ns
-7.794ns
1555s
End of Optimizer WNS Pass 0
-0.001ns
-0.001ns
-1.076ns
-7.794ns
1555s
-----------------------------------------------------------------------------------------------
Tip: Normal for slack to degrade
during cluster steps, and again
during implementation / routing
provided it is recovered
promptly.
73
© 2015 Cadence Design Systems, Inc. All rights reserved.
Trial CTS
Global opt
Area reclaim
DRV opt
WNS reg2reg opt transforms
Prepare clock tree for skewing
WNS reg2reg opt transforms
Skew sinks
refinePlace + opt
(fully) update clock tree
WNS reg2reg opt transforms
Skew sinks
refinePlace + opt
(partially) update clock tree
Finalize clock tree
WNS opt transforms
refinePlace + opt
Final slack polishing
Debugging – Clock tree debugger GUI
(CTD)
74
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Launching
• Can also use ctd_win command
• See user guide for ctd_win options and related
commands
75
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Operation
Menu Bar
Control Panel
Key
panel
Clock
Tree Viewer
Path Browser
76
© 2015 Cadence Design Systems, Inc. All rights reserved.
World Viewer
Clock tree debugger
Delay representation
77
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Right click menu
78
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Expanding & collapsing sub-trees
79
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Multiple input cells
Dotted line connects same cell in different trees
80
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Cross probing
Single
Select
Multiple
select
81
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
Unit delay
Insertion delay
Unit delay
• Each cell and wire has delay of 1
• Can also use ctd_win –unit_delay to avoid invoking
extraction or delay calculation, e.g. on un-placed design
82
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock tree debugger
CCOpt useful skew implementation timing windows
• windows computed by implementation
• non-critical sinks do not need exact skew
balancing – saves clock area and power
• sinks outside of windows are likely due to
clock routing miscorrelation
83
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Insertion delay
84
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Cluster insertion delay
Run cluster only
• Run cluster CTS
– DRV buffering
– No timing slack measurement or optimization
– No NanoRoute of clock nets
set_ccopt_property balance_mode cluster
set_ccopt_property update_io_latency false
ccopt_design [-cts]
• Inspect maximum insertion delay path
– Floorplan problems
– Divider flops (treated as FIXED)
– FIXED clock gates or other clock cells
– Power domain crossing problems
– May want to check path per skew group
report_ccopt_skew_groups
ctd_win
85
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Cluster insertion delay
Highlight longest path in CTD
86
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Cluster insertion delay
Highlight longest path in CTD
Path highlighted in clock tree debugger and layout
87
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Complex clock trees
88
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging – Complex clock trees
Trial CTS
• Trial balancing
– DRV buffering with real cells
– Skew balanced to zero by addition of virtual delays
– Approximate timing report is possible
set_ccopt_property balance_mode trial
ccopt_design [-cts]
• Inspect per skew group insertion delays
– Which ones have increased compared to cluster only run?
– Increase typically due to shared sink with larger skew group
– Experiment with importance property of skew groups
– Details beyond scope of this presentation
• Example conflict on next slide
89
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging - Skew group conflict example
sg1 sg2
logic
2000
sinks
50
sinks
create_generated_clock
–master_clk sg2
100,000
sinks
• User wants minimum insertion delay for skew group sg1. sg2 is
unimportant but needs to be balanced.
• After CTS sg1 has a large insertion delay, despite only 2050 sinks.
• Why?
90
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging - Skew group conflict example
sg1 sg2
Delay to balance 50
sinks with 100,000
must go here
Delay to match
“red” delay ends up
here
logic
2000
sinks
50
sinks
100,000
sinks
create_generated_clock
–master_clk sg2
Significant cluster
buffering delay
here due to large
fanout
• sg1 insertion delay increased in order to balance sg2
– Problem if sg1 insertion delay is critical
– For example if sg1 is functional mode and sg2 is test mode
91
© 2015 Cadence Design Systems, Inc. All rights reserved.
Debugging - Skew group conflict example
sg1 sg2
Delay to balance 50
sinks with 100,000
can go here
logic
2000
sinks
50
sinks
create_generated_clock
–master_clk sg2
100,000
sinks
• Clone the mux
– Adds place for balancing delay to be added without impacting sg1
• Detection
– sg1 insertion delay in trial/full CTS would be significantly larger than just cluster only
CTS
– Impacts CCOpt initial balancing and ultimately timing
92
© 2015 Cadence Design Systems, Inc. All rights reserved.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )