Innovus 18.1 CTS Innovus Product Engineering 2018-05-23 Copyright Statement • © 2018 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence and the Cadence logo are registered trademarks of Cadence Design Systems, Inc. All others are the property of their respective holders. Topics Key • • • • • • • 3 CTS in the Innovus flow Setup TAT Concepts Clock power Flexible H-tree & multi-tap Common UI (CUI) © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 New in 18.1 common_ui_command legacyUICommand CTS in the Innovus flow 4 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CTS in the Innovus flow – Overview 18.1 • Early clock in place_opt_design (optional) place_opt_design optionally with early clock ccopt_design route & post-route opt – Clock congestion awareness – Clock gate enable timing optimization – Useful skew with estimate of clock delays • CTS : ccopt_design – CTS & datapath optimization – Standard effort or extreme effort – Switch to propagated clock timing & source latency update • Useful skew at all steps by default • For CTS only without datapath optimization clock_design / ccopt_design –cts 5 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. – pre-cts useful skew or early clock useful skew – standard effort or extreme effort CCOpt – post-route useful skew CTS in the Innovus flow – Early clock flow (ECF) place_opt_design with placement early clock global opt merge FF cluster & virtual balance power opt timing opt cong repair split FF useful skew ccopt_design 6 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • Awareness of clock routes & clock cell placement – Improved accuracy for congestion, RC, layer assignment, and other optimizations • Optimization of clock gate enable timing • Useful skew with estimate of clock delays • Additional optimization transforms possible compared to post-CTS • Setup CTS before place_opt_design – NDRs, cells, clock spec, ... • Command to enable set_db design_early_clock_flow true setDesignMode –earlyClockFlow true CTS in the Innovus flow – Early clock – How it works place_opt_design with placement early clock global opt merge FF cluster & virtual balance power opt timing opt cong repair split FF useful skew ccopt_design 7 • Clustering and balancing with virtual delays • CTS timing used to annotated clock latencies for ideal clock mode timing analysis • Skewing adjusts the latencies • Latencies are communicated to later CTS via pin insertion delays in the in-memory clock tree spec © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Note: Do not use reset_cts_config / reset_ccopt_config before ccopt_design, doing so will delete the pin insertion delays. CTS in the Innovus flow – Useful skew controls – Common UI set_db opt_useful_skew_pre_cts true* | false set_db opt_useful_skew_ccopt none|standard*|extreme – – – – Determines useful skew flow inside ccopt_design none : No useful skew standard : Standard effort ccopt_design extreme : Extreme effort ccopt_design set_db opt_useful_skew_post_route true* | false set_db opt_useful_skew true* | false – – – – 8 Master switch to disable all useful skew Getting useful skew requires both opt_useful_skew AND opt_useful_skew_<FlowStep> set_db design_flow_effort standard* | extreme At start of ccopt_design, the opt_useful_skew_ccopt setting inherits this standard or extreme setting unless opt_useful_skew_ccopt has been explicitly set by the user © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CTS in the Innovus flow – Useful skew controls – Legacy UI setOptMode –usefulSkewPreCTS true* | false setOptMode –usefulSkewCCOpt none|standard*|extreme – – – – Determines useful skew flow inside ccopt_design none : No useful skew standard : Standard effort ccopt_design extreme : Extreme effort ccopt_design setOptMode –usefulSkewPostRoute true* | false setOptMode –usefulSkew true* | false – – – – 9 Master switch to disable all useful skew Getting useful skew requires both –usefulSkew true AND –usefulSkew<FlowStep> setDesignMode –flowEffort standard* | extreme At start of ccopt_design, the usefulSkewCCOpt setting inherits this standard or extreme setting unless usefulSkewCCOpt has been explicitly set by the user © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Setup 10 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Setup – Standard settings create_route_type –name CLK_NDR ... set_db cts_route_type_top/trunk/leaf ... set_db cts_buffer_cells ... set_db cts_inverter_cells ... set_db cts_gating_cells ... (set_db cts_logic_cells ...) • Must configure – – – – – Route types with NDRs Cell lists Transition target Skew target Top net fanout threshold if using top nets • Run check_design set_db cts_target_max_transition_time ... set_db cts_target_skew ... create_clock_tree_spec check_design [-type cts] (place_opt_design if using ECF) ccopt_design / clock_design 11 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. – Identifies common setup mistakes – Identifies issues in the design data – Identifies overly aggressive constraints • Customizing the clock spec – Set stop & ignore pins before creating the spec – Use create_skew_group, update_skew_group and other spec manipulation commands instead of editing the clock spec file 18.1 Setup – Check Design • check_design –type cts – – – – – – – Many new checks added Check basic settings are present Check for missing or overly aggressive targets Check routing configuration Check don’t touch, fixed & related Check multi-tap, e.g. uncloneable instances ... 40+ individual checks in total • check_design –type all – Includes CTS checks only if clock spec is loaded 12 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Examples: > help CHKCTS-010 Error/Warning message: CHKCTS-010: Route type(s) %s used for trunk or top clock nets do not have an nondefault rule (NDR) set. > help CHKCTS-051 Error/Warning message: CHKCTS-051: The timing analysis type is not set to OCV (on chip variation). Setup – Net types • Top net type applies to all nets with a sub-tree sink count greater than a user set threshold top trunk leaf set_db cts_top_fanout_threshold 1000 set_ccopt_property routing_top_min_fanout 1000 Default is unset – top net type is not used by default • Leaf net type applies to any net directly connected to a sink • Can force the net connected to a sink to be considered trunk set_db pin:name .cts_routing_trunk_override true set_ccopt_property trunk_override –pin name true • Overriding the route type for a particular net • Each sink counts as 1, but user can override: set_db pin:name .cts_top_fanout_count_override 100 set_ccopt_property routing_top_fanout_count 100 -pin name create_route_type –name rt1 ... set_db pin:i0/ckout .cts_pin_route_type rt1 set_copt_property pin_route_type rt1 -pin i0/ckout before cts i0 i1 propagates over buffering after cts i0 13 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. i1 Setup – Sink offsets • SDC / user command approach – Specify early or late clock arrival time note – Early typically negative due to 0 network latency inverted set_clock_latency -1 [get_pins {m0/ck}] sign Insertion delay cts_pin_insertion_delay macro m0 ck • CTS pin insertion delay approach – Specify the clock insertion delay inside the macro set_db pin:macro/CK .cts_pin_insertion_delay 1ns set_ccopt_property insertion_delay 1ns –pin m0/CK • create_clock_spec converts SDC clock latency to CTS pin insertion – Library specifies the clock insertion delay inside the macro delay • Library max clock tree path approach convert_lib_clock_tree_latencies • Recommendation – Specify in SDC or use library max clock tree path – Visible to slack driven placement and non-ECF pre-cts optimization 14 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • To delay sinks instead of advancing them, invert the sign • Reporting – .logv or command report_pin_insertion_delays report_ccopt_pin_insertion_delays Setup – Stop & ignore pins • Stop & ignore pins – Clock spec creation stops tracing at the pin – Clock spec creation will trace to this pin, even if SDC clocks do not set_db pin:name cts_sink_type stop set_ccopt_property sink_type stop –pin name • Stop pin – The pin is considered a sink to be balanced in any skew groups which reach it, even if the pin is not identifiable as a “clock pin” from the library model • Ignore pin – The pin is not balanced • Skew group specific ignore pin – A pin which is ignored in a skew group, but may be balanced in other skew groups 15 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. set_db pin:name cts_sink_type ignore set_ccopt_property sink_type ignore –pin name update_skew_group ... – add_ignore_pins ... modify_ccopy_skew_group ... – add_ignore_pins ... 17.1 Setup – Cells & library trimming 25 • CTS will filter the supplied cell lists to remove inferior cells DRIVE STRENGTH 20 15 Drive Strength 10 Area 5 – Improved clock QOR and run-time – Filtering based on cell drive versus area – Applies to buffers, inverters, clock gates 0 0 10 20 30 40 50 60 70 60 cells BUFFER INDEX (SORTED BASED ON DRIVE) 25 20 11 cells DRIVE STRENGTH • Recommendation: Continue to specify only LVT cells for clock 15 Drive Strength 10 Area 5 0 0 2 4 6 8 BUFFER INDEX (SORTED BASED ON DRIVE) 16 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 10 12 Setup – Cell filtering report 18.1 • Report on why CTS rejected cells which the user specified • report_cts_cell_filtering_reasons (report_ccopt_cell_filtering_reasons) • Report included in verbose log (innovus.logv) output near start of CTS • Refer to man page for explanation of reasons • Note: If run in the same Innovus session after CTS it will execute quickly, otherwise it will run CTS initialization to perform the filtering. 17 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Turn Around Time (TAT) 18 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 TAT – 18.1 Improvement Project Inst count 1.4 M CPU freq / #CPUs 2.3G/16 CTS Core + Services (mins) 250 Ref 18.1 200 168 57 Minutes per 1M instances 2.3G/16 4.8M 2.3G/16 436 149 635 221 135 132 114 120 17.1 119 18.1 100 81 70 4.2 M 3.0G/16 566 227 62 54 50 50 48 46 41 34 2.3 M 2.3G/16 343 111 1.8 M 2.3G/16 145 61 2.3 M 3.0G/16 541 162 2.2 M 3.0G/16 420 137 5.6M 2.3G/16 667 182 19 • CPU, GPU, Automotive, Networking • All ≤16nm 191 190 150 2.3 M • Block level designs 235 33 0 1 2 3 4 5 6 7 8 9 • 3.1x average speed up • 1hour / 1M instances average • Core CTS & services (legalization, routing) • Multi-threading in some steps & reporting • Post-CTS opt excluded © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts 20 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Transition to propagated mode timing Question – What happens at CTS? ✓Add buffering & size/place clock cells for drive and delay Sounds straightforward, but now timing is with propagated clocks: ✓SOCV/AOCV/derates impact timing from non-common clock path ✓Clock gate enable timing is no longer ideal ✓Inter clock timing depends on achievable insertion delays ✓Clock generator control logic timing is no longer ideal ✓Single CTS unbufferable net can impact entire design timing ➢ More than just buffering clock nets! 21 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Clock trees & skew groups ck1 skew group 2 ck2 • Clock trees – Physical constraints – DRV, NDR • Skew groups G skew group 1 – Balancing constraints – Auto spec 1:1 named clock/analysis_view G • Clocks – Timing reports – SDC / CTE 22 skew group 1&2 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. ... ... ... Concepts – Auto clock spec – Single mode example create_clock [get_ports {ck1}] ck1 clock_tree:ck1 skew_group:ck1 create_generated_clock d1/CK ignored in -name gck1 skew_group:ck2 -divide_by 2 [get_pins {d1/Q}] -source [get_pins {d1/CK}] -master_clock [get_clocks {ck1}] clock_tree:ck2 skew_group:ck2 ck2 generated clock tree ck2_generator_for_gck2<1> d1 d2 G m0 generated clock_tree:gck1, reporting only skew group f1 f2 f3 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. create_generated_clock -name gck2 -divide_by 2 [get_pins {m0/Y}] -source [get_pins {d2/CK}] no clock tree, reporting only skew group f4 Note: Skew groups and clock trees often have the same name 23 create_clock [get_ports {ck2}] green - SDC blue - clock spec Concepts – Auto clock spec – Multi mode example ck mode0.sdc create_clock [get_ports {ck}] create_generated_clock -name gck -divide_by 2 [get_pins {d1/Q}] -source [get_pins {d1/CK}] set_case_analysis 0 [get_ports {sel_div}] d1 sel_div 0 1 mode1.sdc create_clock [get_ports {ck}] create_generated_clock -name gck -divide_by 2 [get_pins {d1/Q}] -source [get_pins {d1/CK}] set_case_analysis 1 [get_ports {sel_div}] skew_group:ck/mode0 skew_group:ck/mode1 • • • 24 update_skew_group –skew_group ck/mode0 -add_ignore_pins mux/I1 update_skew_group –skew_group ck/mode1 -add_ignore_pins mux/I0 Two skew groups with source at ‘ck’ input – one per clock per mode One ignored at mux ‘0’ input and other ignored at mux ‘1’ input Paths through the mux ‘0’ input are not balanced with paths through the mux ‘1’ input © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. blue – mode0 green – mode1 ccopt_design Concepts – CTS internal flow – Standard effort Initialization library trimming, identify placeable area, validate transition & skew targets, log settings Construction clustering, legalization, DRV repair Implementation EGR Post-Conditioning Clock Routing optimize insertion delay and power, balancing early global route, area reclaim, DRV repair detailed routing along early global route guides NR Post-Conditioning DRV and skew repair Post-CTS Optimization scan re-order, datapath optimization, useful skew DAG Stats – Reported at each step and sub-steps: Clock area, cap, cell counts, transitions, insertion delay, skew, ... 25 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Clock routing Initialization ccopt_design Construction Implementation EGR Post-Conditioning Clock Routing NR Post-Conditioning 18.1 • Early global route post-conditioning • Clock nets are detail routed with NanoRoute • Post-conditioning may resize and move clock cells small distances leaving small opens in clock nets • Optimization may modify or add clock cells, or re-size flops, also leaving small opens in clock nets • Design routing repairs clock nets first, closing the opens • Post-route optimization includes CTS PRO to further repair clock DRVs Post-CTS Optimization route & post-route opt 26 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1: Many improvements in EGR, EGR Post-Conditioning, NR Post-Conditioning Concepts – DAG Stats Clock DAG stats after update timingGraph: cell counts : b=719, i=2653, icg=6495, nicg=0, l=824, total=10691 cell areas : b=1668.215um^2, i=8067.257um^2, icg=54799.189um^2, nicg=0.000um^2, l=7370.830um^2, total=71905.491um^2 cell capacitance : b=1.024pF, i=15.171pF, icg=15.285pF, nicg=0.000pF, l=3.642pF, total=35.122pF counts, area, sink capacitance : count=104926, total=255.000pF, avg=0.002pF, sd=0.001pF, min=0.001pF, max=0.040pF wire capacitance : top=0.000pF, trunk=98.883pF, leaf=162.988pF, total=261.871pF cap, length wire lengths : top=0.000um, trunk=573904.625um, leaf=776348.715um, total=1350253.340um Clock DAG net violations after update timingGraph: Remaining Transition : {count=1, worst=[9.7ps]} avg=9.7ps sd=0.0ps sum=9.7ps drv violations Fanout : {count=3, worst=[20, 20, 6]} avg=15.333 sd=8.083 sum=46 Capacitance : {count=4, worst=[1.714pF, 1.451pF, 1.166pF, 0.002pF]} avg=1.083pF sd=0.755pF sum=4.333pF Clock DAG primary half-corner transition distribution after update timingGraph: Trunk : target=300.0ps count=3849 avg=90.5ps sd=77.4ps min=0.0ps max=300.0ps {3135 <= 180.0ps, 459 <= 240.0ps, 174times <= transition 270.0ps, 44 <= 285.0ps, 37 <= 300.0ps} Leaf : target=300.0ps count=6980 avg=166.2ps sd=67.5ps min=16.7ps max=309.7ps {4178 <= 180.0ps, 1714 <= 240.0ps, 619 <= 270.0ps, 296 <= 285.0ps, 172 <= 300.0ps} {1 <= 315.0ps, 0 <= 330.0ps, 0 <= 360.0ps, 0 <= 450.0ps, 0 > 450.0ps} Clock DAG library cell distribution after update timingGraph {count}: Bufs: CTB_F4_SVT: 27 GLITCHGOBBLER_F4_DH_SVT: 2 CTB_F1_SVT: 690 Invs: INV_B16_SVT: 67 INV_B14_SVT: 68 INV_B12_SVT: 137 INV_B10_SVT: 175 INV_B8_SVT: 122 INV_B6_SVT: 186 INV_B5_SVT: 117 CTI_F4_SVT: 123 INV_B4_SVT: 141 INV_B3_SVT: 189 INV_B2_SVT: 258 INV_B1_SVT: 1070 library cell usage ICGs: ICG_F6_SVT: 81 ICG_F5_SVT: 72 ICG_F4_SVT: 942 ICG_F3_SVT: 385 ICG_F2_SVT: 1792 ICG_F1_SVT: 3223 Logics: CTNAND2_F4_AHP_SVT: 2 CTOR2_F4_AHP_SVT: 67 CTMUX2_F4_SVT: 685 CTEXOR2_F4_SVT: 5 CTENOR2_F4_AHP_DH_SVT: 5 CTAND2_F4_SVT: 58 NOR2IA_F8_4SR_75LL: 1 CTOR2_F4_SVT: 1 Primary reporting skew group after update timingGraph: primary reporting skew group skew_group PLLCLK/DFT: Half-corner MAX_DELAY_CORNER:setup.late: insertion delay [min=7146.5, max=7596.1, avg=7428.6, sd=90.2], skew [449.6 vs 400.0*], 99.2% {7182.6, 7582.6} (wid=1060.4 ws=584.8) (gid=7052.3 gs=859.6) Skew group summary after update timingGraph: all skew groups ... 27 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Clock tree debugger (CTD) Insertion delay Unit delay • Coloring by clock tree, skew group, net type, cell type, transition time, activity, ... • Cross probing with layout view 28 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • Dotted line indicates multiple input cell present in more than one clock tree • Open in unit delay view to avoid invoking RC extraction or delay calculation, e.g. on unplaced design, or at post-route gui_open_ctd -unit_delay ctd_win -unit_delay Concepts – Don’t touch means don’t size 18.1 • 18.1 CTS DOES NOT SIZE USER DON’T TOUCH INSTS – Previous tool versions resize don’t touch instances by default – New behavior is consistent with other parts of Innovus and other tools • How to say don’t touch but permit resize – set_db inst:name .dont_touch size_ok – (dbSet [dbGetInstByName name].dontTouch sizeOk) – Note size_same_height_ok/size_same_footprint_ok not supported and do not enable sizing • How to find clock tree instances which are fully don’t touch by user – get_db clock_trees .insts -expr {"user true" in $obj(.dont_touch_sources)}’ – See also report_preserves to report on the whole design 29 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Don’t touch means don’t size – Warnings & checks 18.1 • CTS warning message – IMPCCOPT-1437: Found %d clock tree instances which are user dont_touch and are not resizable - this can impact clock QOR. – The CUI version of this message also suggests the command given on the previous slide. • Check design warns about don’t touch instances – Includes checks for resizable instances and ones which are don’t touch from other sources – check_design [-type cts] – CHKCTS-052: Found %d instances which are user don't touch. Refer to CHKCTS-58 for a list of instances. • Some existing users are already using “don’t touch means don’t size” – Setting was “set_ccopt_property allow_resize_of_dont_touch_cells false”. – This property, and the corresponding private CUI attribute, will be phased out in 18.2/19.1 30 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Get CTS right before optimizing timing ✓ Run check_design ✓ Check warnings and errors in the log ✓ Check maximum insertion delay – If too large consider cluster run, and check the maximum insertion delay path in the CTD & layout ✓ Check no unbuffered nets and DRV violations – – – – Check warnings in the log, color options in the CTD Check in CTD and layout view Check MSV warnings & setup Use cluster run to debug if needed ✓ Get CTS right before optimizing timing – – – – – 31 No optimization – clock_design / copt_design –cts Check skew group insertion delay & skew occupancy Check report_clock_trees/report_skew_groups Check DAG stats at CTS internal flow steps Look at the clock tree debugger © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • Cluster run set_db cts_balance_mode cluster set_ccopt_property cts_balance_mode cluster • Disable detail routing to save time set_db cts_route_clock_tree_nets false set_ccopt_property use_estimated_routes_during_fina l_implementation true • Measuring cost of balancing – Clustering depends on the skew target. Cluster run skips some optimizations. – Recommend to increase the skew target rather than use cluster run. 18.1 Concepts – Max clock tree path – Overview • Cell library optional max_clock_tree_path attribute – Specifies cell internal clock tree delay, index by input transition – Is NOT a timing arc, not included in any timing analysis/report – Missing early offsets for memories a common cause of bad reg2mem hold TNS – Optimistic/pessimistic setup timing for reg2mem/mem2reg • Historical CTS support – Off by default, behaves like an additional pin insertion delay – Too late in the flow – no visibility at placement Insertion delay • Timing significance • New command convert_lib_clock_tree_latencies – Convert MCTP data to per pin clock latencies – Aware that SDC pin latencies override SDC clock latencies – Updates in-memory timing constraints or export SDC text 32 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. macro max_clock_tree_path Concepts – Max clock tree path – Flow 18.1 converts .lib max_clock_tree_path to set_clock_latency [get_pins] unplaced initial DB convert_lib_clock_tree_latencies create_clock_tree_spec place_opt_design ccopt_design convert_lib_clock_tree_latencies [-views <views>] [-latency_file_prefix <string>] [-pins <pins>] [-override_exising_latencies[_pins <pins>]] [-sum_existing_latencies[_pins <pins>]] • Default behavior is to generate and apply set_clock_latency commands for all clock pins with MCTP data over all views, excluding pins which have an existing latency in any view converts set_clock_latency [get_pins] to set_db cts_pin_insertion_delay 33 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • Utility command to get access to MCTP data get_lib_clock_tree_path_delay -base_pin <...> -edge rise -power_domain <...> -transition <...> -view <...> Concepts – Clock gate merging Merging duplicate siblings in DAG... Resynthesising clock tree into netlist... Reset timing graph... Reset timing graph done. Resynthesising clock tree into netlist done. Summary of the merge of duplicate siblings Globally unique enables - The number of clock gates which have a unique enable input as per this definition: Imagine all buffers and logical pairs of inversions are deleted from the datapath and the clock tree. A set of 1 or more clock gates with their enable input driven by the same pin would be a “globally unique enable”. --------------------------------------------------------------Description Number of occurrences --------------------------------------------------------------Total clock gates 4314 Globally unique enables 4263 Potentially mergeable clock gates 51 Actually merged 1 --------------------------------------------------------------Cannot merge reason: UniqueUnderParent 49 Cannot merge reason: IsDontTouch 1 --------------------------------------------------------------- Potentially mergeable clock gates: This is defined as the total number of clock gates minus the number of globally unique enables. Cannot merge reason: UniqueUnderParent This means two clock gates have the same ‘globally unique enable’ controlling their enable input, but are in the fanout of logically different clock gates further up the tree. Merging these would be logically inequivalent. 34 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Clock gate merging – Example • clock data • • i0 G1 G3 G2 i1 G4 • G5 i2 35 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Pretend all datapath buffers and matching pairs of inverters are removed. G4 and G5 can be merged because they are enabled by the same logic instance i2. G1 and G2 can not be merged because they are enabled by different logic instances i0 and i1. G3 can not be merged with G4/G5 because G1 and G2 have different enables. Total clock gates: 5 Globally unique enables: 3 Potentially mergeable clock gates: 5-3=2 Cannot merge: UniqueUnderParent : 1 Number of clock gates actually merged: 2-1=1 (G4 is merged with G5 reducing the clock gate count by 1) Concepts – Source latency update set_clock_latency –source 0 [get_clocks ck] virtual time zero -3.5ns pseudo timing graph representation 0ns ‘clock object’ • Before CTS set_clock_latency –source -3.5 [get_pins ck] clock root pin set_clock_latency 0 [get_clocks ck] 0ns CTS insertion delay of 3.5ns average clock arrival time 0ns – – – – Network latency: 0 / not set Source latency: 0 / not set Clock arrival time at flops: 0 Clock arrival time at IOs: 0 • After CTS & source latency update – – – – Clock insertion delay: 3.5ns Source latency change at clock input: -3.5ns Clock arrival time at flops: 0 Clock arrival time at IOs: 0 • Source latency adjustment – Source latency is applied to clock inputs not clock objects – report_clocks still lists clocks as ‘ideal’, scope for future reporting improvement set_db cts_update_clock_latency true* | false set_ccopt_property update_io_latency true* | false 36 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – CTS Skew vs Skew CTS times using single half-corner – delay corner with late derates, max delays/caps Global skew CTE times using delay corner with early & late derates, min & max delays/caps, and CPPR Local skew • Sink 1 : 2.0*1.1 = 2.2 including late derates • Launch : 2.0*1.1 = 2.2 including late derates • Sink 2 : 1.9*1.1 = 2.09 including late derates • Capture : 1.9*0.9 = 1.71 including early derates • Skew = 2.2-2.09 = 0.11 • Skew = 2.2-1.71 - CPPR = 0.49 - CPPR Note: Similar situation with SOCV delay sigma – CTS excludes, CTE includes 37 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 Concepts – Antenna diodes • Some flows have antenna diodes present pre-CTS – For example if added at clock input ports during top level partitioning • Antenna diodes connected directly to a clock input port – Remain connected directly to that clock input port – Are not re-placed by CTS, expectation is the initial placement is sensible – Are treated as an ignore pin by CTS • Antenna diodes found anywhere else in a clock tree – Deleted by CTS, unless they are dont_touch 38 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. input port block design preserved internal flops deleted 18.1 Concepts – Port isolation buffers • ≤17.1 – Only input ports supported – – – – input port set_ccopt_property add_driver_cell <lib_cell> Add the specified cell at clock input ports, or two cells if inverter Global or per clock tree Now obsolete, will be discontinued in a future release internal flops • 18.1 – Per port control, input & output ports – set_db port:ck .cts_add_port_driver base_cell:BUFX2 – set_ccopt_property add_port_driver BUFX2 –pin ck – If using inverters, only one inverter placed near the port, other inversion can be freely combined with CTS and/or shared with other ports. 39 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. block design output port 18.1 Concepts – Per power domain transition targets • ≤17.1 Does not support per power domain max transition targets – Transition target applied to entire design – Problem if some power domains are low voltage and the target for normal voltage power domains is not achievable in the lower voltage • 18.1 Permits indexing by power domain – set_db cts_target_max_transition_time ... –index power_domain:<pdname> • Note transition target is applied to the effective power domain 40 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. low volt Concepts – Instance naming 18.1 • ≤17.1 Problems – Temporary ‘uid’ names appeared in the log but renamed before CTS completes – Clock name depends on SDC ordering if more than one parent – Level numbers often incorrect, e.g. post-route repair CTS_ccl_BUF_TCLK_G2 _L12_2 • 18.1 Solution – Avoids all the problems with the old scheme – Easy to write down with pencil & paper – The number is unique over all CTS inserted cell instances 41 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CTS_ccl_buf_1234 Concepts – Worst chain T clock • Worst chain reports in the log for – Extreme effort CCOpt – Early clock step – Look for worst chain report(s) after skewing steps and not after reclustering steps – Log shows ASCII art representation G • Example shows looping chain with ICG – Adjusting clock delays to the ICG can impact sinks under the ICG • Reporting at other flow steps is not recommended • Other possibilities include – Loop without ICG – Macro 42 – Chain starting and/or ending at an IO – Loop between an ICG driving a single sink © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Concepts – Clock groups • Avoid balancing clocks which are asynchronous according to set_clock_groups or clock-to-clock false paths • Common case is generated clock asynchronous to master clock create_clock [get_ports {ck}] create_generated_clock -name gck -source [get_pins {d1/CK}] -master_clock [get_clocks {ck}] clock flops set_db timing_connectivity_based_skew_groups off* | clock_false_path 43 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. d1 d1 set_clock_groups -group {ck} –group {gck} or set_false_path –from [get_clocks ck] –to [get_clocks gck] set_false_path –from [get_clocks gck] master –to [get_clocks ck] • User control: ck ck gck gck generated clock flops off setting single skew group master clock flops generated clock flops clock_false_path separate skew groups Concepts – Skew vs insertion delay conflict sg1 sg2 sg1 sg2 Delay to balance 50 sinks with 100,000 must go here Delay to match “red” delay ends up here logic 2000 sinks 50 sinks Delay to balance 50 sinks with 100,000 can go here create_gene rated_clock –master_clk sg2 100,000 sinks Significant cluster buffering delay due to large sink count logic 2000 sinks 50 sinks 100,000 sinks Clone the mux to resolve the problem • Complaint – Insertion delay of sg1 is increased significantly between cluster and balance steps – this is necessary to balance the skew within sg1 and skew within sg2 • General case – a net with different sets of skew groups at it’s immediate fanout – In the example above the mux has one fanout with {sg1,sg2} and one fanout with {sg2} 44 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Clock Power 45 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Clock Power – Placement impact • Clock power is significantly influenced by flop placement – Leaf level clock gating divides flops into groups – Tight non-overlapping placement of each group key to clock power reduction • Design mode power effort or placement controls set_db design_power_effort none|low|high OR set_db place_global_clock_power_driven true set_db place_global_clock_power_driven_effort standard|high (set_db place_global_clock_gate_aware false) Bad for clock power F F F F F F F 46 IC G IC G F F F F F F F F F F F F F IC G F IC G © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. F F F F IC G IC G F F F F F F F F IC G F FF F F IC G F F Good for clock power Clock Power– Buffering under leaf ICGs • High up ICGs with buffering sub-tree – good for power • Many ‘leaf’ ICGs having 13 buffers/inverters under them – something is probably wrong – Check placement, transition target, leaf NDR/vias, max fanout constraint 47 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Flexible H-tree & multi-tap 48 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Flexible H-tree – Flow H-tree with early clock flow • Build H-tree before CTS, or before placement if using ECF • Further details in App Note – “Flexible H-tree and Multi-Tap Clock Flow in Innovus”, support.cadence.com 49 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. create_clock_tree_spec check vias or use stacked vias … early clock flow & CTS configuration here ... Top routing rule NDR create_route_type … -shield_net … set_db cts_route_type_top ... H-tree transition constraint set_db cts_target_max_transition_time_top … Define H-tree(s) create_flexible_htree … [create_flexible_htree …] Build/Debug H-tree(s) synthesize_flexible_htrees … Multi-tap in ECF place_opt_design Multi-tap CTS ccopt_design 18.1 Flexible H-tree – New algorithm • Redesigned algorithm – Makes trade-offs between turns, source-to-sink and net length balancing – Works on EGR routing grid instead of coarse grid, for improved routing – especially around power grid • Reduced wire length and power • Gives a more ‘clean looking’ H-tree structure Design 1 2 3 4 5 6 7 8 9 10 50 Skew (ps) old 2.46 3.50 0.18 2.37 4.21 2.30 28.71 1.05 9.24 32 new 0.71 0.10 0.25 0.61 1.50 1.40 7.72 0.11 5.75 36 Less of this Max latency (ps) old 100 78 151 155 79 63 204 102 231 721 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. new 100 88 150 145 68 44 186 54 206 664 -0% +13% -1% -6% -14% -31% -9% -47% -11% -8% Wire length (um) old 3529 5171 2699 2447 1258 1091 37068 3281 17877 66699 new 2765 5430 2814 2614 1089 1093 33792 3086 15624 53420 -22% +5% +4% +7% -13% +0% -9% -6% -13% -20% Runtime (s) old 1067 672 204 39 167 167 3051 383 7941 3895 new 316 678 241 77 171 171 2852 216 3533 3870 -70% +1% +18% +97% +2% +2% -7% -44% -56% -1% Flexible H-tree – New algorithm – Stress test 60 sinks NEW OLD 51 Wire length 15141um Wire length 14598um Skew 12ps Skew 4ps © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 Flexible H-tree – Distance based mode & debug • Distance mode DRV checks -mode {drv | distance} Specify distance to enable the distance based mode. The default is drv. max_driver_distance <value in um> Specify the maximum permitted total length of any net driven by an H-tree driver. Both -mode distance and max_driver_distance must be specified together. -max_root_distance <value in um> Specify the maximum permitted total length of the net driven by the H-tree root pin. Defaults to the max_driver_distance setting if not specified. • Debug – Build the H-tree based on expected max distances, then debug issues such as large transition times due to incorrect via setup 52 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. • Dry run and image debug output mkdir debug create_flexible_htree -image_directory debug ... synthesize_flexible_htrees -dry_run Flexible H-tree – Clock spec updates create_skew_group -name flexible_htree_myhtree/reporting_only -constrains none -source ck -sinks {...} clock_tree ck skew_group ck • Clock tree source group and generated clock trees added – needed for multi-tap CTS • Reporting only skew group added clock_tree_source_group myhtree generated clock_tree flexible_htree_myhtree_0...3 53 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Multi-tap – Clustering based tap assignment • Cluster driven tap assignment is default (16.21), targets limited increase in ICG count boundary few um DFF • This setting not required cts_spec_config_create_ clock_tree_source_groups extract_balance_multi_ source_clocks – Continue to use only if requiring clock spec creation to create clock tree source groups for multi-source SDC clocks • Improved support for fine tap grids DFF ICG set_db cts_clustering_source_group_max_cloned_fraction 0.2 set_ccopt_property clustering_source_group_max_cloned_fraction 0.2 54 18.1 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. – Auto maximum radius per tap avoid ‘reaching too far over’ other taps (18.11) – Less need to increase the permitted amount of clock gate cloning Multi-tap – Debug tips – Uncloneable nodes cloning A A G G G B • Log output – all clear There are no uncloneable nodes in this source group • Log output – check if expected or not – check_design also checks this • Uncloneable nodes are the most common cause of bad multi-tap QOR Uncloneable nodes: ------------------------------------------------------------------------------------Node Level Downstream Sinks Reasons ------------------------------------------------------------------------------------block_icg_0 5 3081 dont_touch.user block_icg_1 5 2527 dont_touch.user block_icg_2 5 1684 dont_touch.user block_icg_3 5 1459 dont_touch.user some/dont_touch_buffer/here/i0 2 5 dont_touch.user some_other/dont_touch_buffer/i1 1 4 dont_touch.user a/leaf/level/mux/driving/one/sink 0 1 dont_touch.user ------------------------------------------------------------------------------------55 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. B Multi-tap – Debug tips – Tap allocation • Log output – tap allocation stats – Check for unused taps – Check for taps with disproportionate number of sinks or very large radius [CLU] [CLU] [CLU] [CLU] [CLU] [CLU] [CLU] [CLU] [CLU] 56 Tap allocation statistics: tap anchor size radius hpwl mst L1ICGs ============================================================================ clk_0 ( 138.560, 63.220) 1419 160.78 322.97 8161.65 49 clk_7 ( 297.792, 63.220) 1672 167.35 311.98 9721.78 82 clk_1 ( 138.560, 168.820) 1694 176.90 325.37 8822.09 59 clk_8 ( 297.792, 168.820) 1706 136.89 347.87 9208.19 71 clk_2 ( 138.560, 274.420) 2004 151.03 378.45 11453.22 69 clk_9 ( 297.792, 274.420) 5317 155.82 344.99 18088.76 593 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Multi-tap – Debug tips – Cluster CTS • Check cluster only CTS – Insertion delay under each tap buffer should be roughly the same if the tap locations and tap assignment are good – Taps with less delay will be padded out by CTS to match the slowest sub-tree 57 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 Multi-tap – Merging between taps • CTS merges (de-clones) clock gates & logic and then performs tap assignment • ≤17.1 Merge did not merge between taps – Existing cloned clock gates under different taps would remain as clones and be allocated to new taps, perhaps even the same tap, or perhaps be cloned further • 18.1 enables merging between taps by default • Important for flows which perform tap assignment more than once – Early clock flow – User assign_clock_tree_source_groups before CTS – Additionally tap assignment approximately places clones 58 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Initial section of CTS flow Remove existing drivers Merge Tap assignment Clustering Note: Known corner case: Tap assignment can clone clock gates leading to macros/memories but merging does not merge such clock gates by default. Multi-tap – Tap buffer sizing • Clock tree sources resizable by CTS – Aids balancing – Downsizing saves power, e.g. on lightly loaded tap buffer – Only clock sources which are buffers, inverters, clock gates, or single output logic can be sized Resizable set_db cts_size_clock_sources true set_db cts_clock_source_cells <list> <base cell names> set_ccopt_property size_clock_source true set_ccopt_property clock_source_cells <base cell names> 59 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Common User Interface (CUI) 60 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. inst:cts_cui CUI – Legacy property behavior refresher • Generic form – set_ccopt_property <property name> <value> [-<object_type> <object name>] – get_ccopt_property <property name> [-<object_type> <object name>] • What happens when object/index is not specified? – set_ccopt_property target_skew 0.123 – Sets skew target on all existing skew groups as well as the skew target used for any future created skew groups – get_ccopt_property target_skew – Returns single value only if all skew groups and the ‘unkeyed’ value used for new skew groups are same setting – Otherwise returns an unfriendly list of skew group names and targets not accepted by set_ccopt_property • Known as “Last one wins” – No rule which says per skew group value overrides global value – Easy to overwrite existing settings • Used for all index types except for delay_corner – delay_corner defaults to the primary half corner, and ‘late’ is assumed by default 61 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 CUI – 18.1 Cleanup • Major review of public CUI attributes and commands – CUI production status • Some changes are not 100% backward compatible – this was unavoidable – Most users will NOT have a problem – Scripts configuring standard settings should continue to work, perhaps with warnings • Attribute naming and cleanup – Many attributes renamed – old names continue as aliases and issue a warning – Many attributes which should have been private are now private – ‘Exploded’ attribute names with _early/late, _min/max, _rise/fall suffixes hidden – cts_<attribute> becomes equivalent to cts_<attribute>_late_max_rise – Attributes which exist on root and other object types – get_db returns only the root setting, not an unfriendly list – Attributes which only make sense on individual objects removed from root – Newer attributes accept and return DPOs – Some less used attributes to be cleaned up in 18.2 • No plan to apply cleanup to legacy UI 62 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. Note: “root” means “CUI root object”, not the root of a clock tree CUI – Commands 18.1 • Legacy (LUI) CCOpt commands mapped to CUI commands – get_common_ui_map <legacy command name> • DPO = Tcl Dual Ported Object – To the user looks like object_type:name, for example clock_tree:clk1 – Compare to plain text name clk1 • Commands now accept DPOs – Example: report_skew_groups –skew_groups skew_group:clk1 – Commands continue to accept plain text names for compatibility – Note: Some internal debugging and private commands may not accept DPOs • Use get_db instead of get_ccopt_* & get_clock_tree_* – Previous CUI get_clock_tree_* commands to get cells, instances, sinks are deprecated – Example: get_common_ui_map get_ccopt_clock_tree_cells suggests using get_db clock_trees .insts 63 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CUI – Attributes – Overview 18.1 • Legacy CCOpt properties mapped to CUI CTS attributes – get_ccopt_property_common_ui_map <legacy property name> – Note: All properties are mapped, not just public ones, private attributes subject to future review • Types of indexing – By attributes on different object types: skew_group, clock_tree – By net type: top/trunk/leaf – By object type: power_domain, delay_corner, ... 64 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CUI – Attributes – Common Get Examples • Getting clock trees – get_db clock_trees • Getting skew groups – get_db skew_groups • Getting clock tree instances – get_db clock_trees .insts get_db clock_tree:clk1 .insts • Getting clock tree nets – get_db clock_trees .nets get_db clock_tree:clk1 .nets • Getting clock tree sinks and source – get_db clock_trees .sinks get_db clock_tree:clk1 .source • Getting skew group sinks, active sinks, ignore pins and sources – get_db skew_group:ck .sinks – get_db skew_group:ck .ignore_pins 65 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. get_db skew_group:ck .sinks_active get_db skew_group:ck .sources 18.1 CUI – Attributes – Common Set Examples • Setting skew target for current and future defined skew groups – set_db cts_target_skew 200ps • Setting max transition target – set_db cts_target_max_transition_time 150ps • Setting route type assignments – create_route_type –name trunkrt ... – set_db cts_route_type_trunk trunkrt • Setting a pin insertion delay – set_db pin:mem0/ck .cts_pin_insertion_delay 1ns 66 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 CUI – Attributes – Naming convention 18.1 • Attributes on root always have cts_ prefix – Corresponding attribute on CTS specific object, e.g. skew_group, has the same name – Exceptions are ccopt_ and ctd_ prefix for useful skew controls and clock tree debugger – Example: cts_target_skew • Attributes only on CTS objects have cts_<object_type> prefix for consistency, and clarity – Example: skew_group:sg1 .cts_skew_group_constrains • Attributes on other DB objects, e.g. pin/port/inst have cts_ prefix – Example: pin:f0/ck cts_sink_type – Note: cts_pin_insertion_delay is an exception • Attributes controlling create_clock_tree_spec have prefix cts_spec_config_ – Example: cts_spec_config_create_clock_tree_source_groups • DB related attributes, i.e. pins/nets/insts on CTS objects are unprefixed – Example: get_db skew_group:ck .sinks_active 67 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 CUI – Attributes – Indexing – Root vs specific objects • Some attributes exist on both the root object and on specific CTS objects – Technically these are different attributes, but they have the same name • Set/reset the attribute on root also sets/resets the attribute on all CTS objects • Getting the attribute on root returns the root attribute value only • Example root sg1 sg2 set_db cts_target_skew 0.1 get_db cts_target_skew set_db skew_group:sg1 .cts_target_skew 0.2 get_db cts_target_skew create_skew_group –name sg2 ... get_db skew_group:sg1 .cts_target_skew get_db skew_group:sg2 .cts_target_skew set_db cts_target_skew 0.3 get_db skew_group:sg1 .cts_target_skew get_db skew_group:sg2 .cts_target_skew 68 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. set all to 0.1 root returns 0.1 set sg1 to 0.2 root returns 0.1 create sg2 sg1 returns 0.2 sg2 returns 0.1 set all to 0.3 sg1 returns 0.3 sg2 returns 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.3 0.3 0.3 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.1 0.1 0.1 0.3 0.3 0.3 18.1 CUI – Attributes – Indexing – Net type • Attribute name with net type suffix – <base name>_top, <base name>_trunk, <base name>_leaf – Set the attribute for a particular net type only – Example: create_route_type –name trunkrt ... set_db cts_route_type_trunk trunkrt • Attribute name without net type suffix – – – – – – – 69 • cts_route_type_<nettype> is a special case, as plain cts_route_type attribute is not defined <base name> Get always returns a Tcl dict as a list: top <value> trunk <value> leaf <value> Set using a single value to change top, trunk and leaf setting Set using a Tcl dict to set one or more of top, trunk and leaf See man dict for more information on Tcl dict Only exists for some attributes, notably plain cts_route_type is not available Example: > get_db cts_target_max_transition_time top 0.15 trunk 0.15 leaf 0.15 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CUI – Attributes – Indexing – -Index 18.1 • Indexing by other objects – get_db/set_db -index – Specify index type and index value – Example: set_db cts_inverter_cells <list> -index power_domain:pd_lowvolt • If index is not specified – set_db : Setting applies to all possible values of the index, e.g. all power domains – get_db : Value returned applies to ‘unindexed’ use, e.g. the value which is used for power domains which do not have a power domain specific value set – Special case: If -index delay_corner:name is not specified then the get/set operation applies only to the currently set primary half corner. 70 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. CUI – Examples > highlight [get_db clock_trees .nets -if {.cts_net_type == leaf}] -color green > highlight [get_db clock_trees .nets -if {.cts_net_type != leaf}] -color red > set clock_wire_length [tcl::mathop::+ {*}[ get_db -unique clock_trees .nets.wires.length]] 804.69 > set icg_power [tcl::mathop::+ {*}[get_db [get_db -unique clock_trees .insts -if {.cts_node_type == clock_gate}] .power_total]] 0.1322357 71 © 2018 Cadence Design Systems, Inc. All rights reserved worldwide.. 18.1 Support & Feedback • Support • Cadence Support Portal provides access to support resources, including an extensive knowledge base, access to software updates for Cadence products, and the ability to interact with Cadence Customer Support. Visit https://support.cadence.com. • Feedback • Email comments, questions, and suggestions to content_feedback@cadence.com. © 2018 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at www.cadence.com/go/trademarks are trademarks or registered trademarks of Cadence Design Systems, Inc. All other trademarks are the property of their respective owners.