Quick Reference Guide to IC Compiler II Clock Tree Synthesis Version 2.0 CONFIDENTIAL INFORMATION The following material is confidential information of Synopsys and is being disclosed to you pursuant to a non-disclosure agreement between you or your employer and Synopsys. The material being disclosed may only be used as permitted under such non-disclosure agreement. IMPORTANT NOTICE In the event information in this presentation reflects Synopsys’ future plans, such plans are as of the date of this presentation and are subject to change. Synopsys is not obligated to develop the software with the features and functionality discussed in these materials. In any event, Synopsys’ products may be offered and purchased only pursuant to an authorized quote and purchase order or a mutually agreed upon written contract. © 2016 Synopsys, Inc. 2 Contents • Introduction • Ensuring the Design is Ready for Clock Tree Synthesis • Setting Up for Clock Tree Synthesis • Performing Clock Tree Synthesis and Optimization • Analyzing the Results © 2016 Synopsys, Inc. 3 IC Compiler II Clock Tree Synthesis (CTS) Overview • IC Compiler II supports various type of clock tree building technologies to meet various design aspects – Standard clock tree synthesis – Fully automated – Low power consumption – Structural multisource clock tree synthesis (MSCTS) – Low skew and very high OCV tolerance – Regular multisource clock tree synthesis (MSCTS) – Lesser power than structural MSCTS – Better OCV than standard CTS This application note covers standard clock tree synthesis in IC Compiler II © 2016 Synopsys, Inc. 4 IC Compiler II Clock Tree Synthesis Flow Overview Read in placed block Perform CTS setup clock_opt \ ‐from build_clock \ ‐to route_clock Analyze results report_clock_qor report_qor Perform post-CTS optimization clock_opt ‐from final_opto © 2016 Synopsys, Inc. 5 Contents • Introduction • Ensuring the Design is Ready for Clock Tree Synthesis • Setting Up for Clock Tree Synthesis • Performing Clock Tree Synthesis and Optimization • Analyzing the Results © 2016 Synopsys, Inc. 6 Design Ready for CTS? Nondefault routing rules and layer list applied ? Balance point definitions are applied ? DRC constraints are set? References are set for CTS ? Pre-CTS sanity check completed ? © 2016 Synopsys, Inc. 7 Contents • Introduction • Ensuring the Design is Ready for Clock Tree Synthesis • Setting Up for Clock Tree Synthesis • Performing Clock Tree Synthesis and Optimization • Analyzing the Results © 2016 Synopsys, Inc. 8 Setting Up the Design for CTS Setting Up the Scenario for CTS • IC Compiler II clock tree synthesisworks on all active scenarios that are enabled for setup or hold for skew balancing and latency optimization • Clock tree synthesis also performs logical DRC fixing on these scenarios if they are enabled for maximum transition or maximum capacitance with the set_scenario_status command • The tool does not use CTS-specific scenario settings such as the cts_mode or cts_corner settings in IC Compiler Leakage Dynamic Name Mode Corner Active Setup Hold Power Power Max_tran Max_cap Min_cap ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ s1 func1 BEST true false true true true true true true s2 func1 WORST true true false true true true true true s3 test1 BEST true false true true true true true true © 2016 Synopsys, Inc. 9 Setting Up Design for CTS Constraints • Use set_max_transition and set_max_capacitance commands to set the DRC constraint for clock tree Define the DRC constraint #DRC constraint current_scenario func_worst set_max_transition 0.10 ‐clock_path [get_clocks ] set_max_capacitance 0.300 ‐clock_path [get_clocks] current_scenario func_best set_max_transition 0.200 ‐clock_path [get_clocks ] set_max_capacitance 0.400 ‐clock_path © 2016 Synopsys, Inc. 10 [get_clocks] for each scenario. During CTS scenario specific constraints are used for DRC fixing in each scenario Setting Up Design for CTS Constraints • By Default CTS uses 0 target skew and latency for clock building • To define relaxed target for low frequency clocks use set_clock_tree_options command #Target skew and latency current_scenario fun_worst set_clock_tree_options ‐target_skew 0.1 \ Define the constraint for each scenario. If the constraint is not defined for one of the scenario then CTS uses default target setting ‐target_latency 0.30 \ ‐corner worst current_scenario func_best set_clock_tree_options ‐target_skew 0.1 \ ‐target_latency 0.150 \ ‐corner best Understand mode and corner specific constraints and apply accordingly © 2016 Synopsys, Inc. 11 Setting Up Design for CTS Constraints Optional settings for clock tree synthesis # Fanout Control set_app_options ‐name cts.common.max_fanout ‐value 2000 # Clock Cell Spacing set_clock_cell_spacing \ ‐x_spacing 0.9 ‐y_spacing 0.4 ‐lib_cells lib/BUF # Max net length set_app_options ‐name cts.common.max_net_length ‐value 200 © 2016 Synopsys, Inc. 12 Setting Up Design for CTS Setting Up the Nondefault Rule (NDR) and Layer Constraints • IC Compiler II CTS supports various types of NDR and layer specifications when building the clock tree 1. Net specific NDR from set_routing_rule • NDR and layer list does not get propagated to newly created nets 2. Net specific NDR from set_clock_routing_rules • NDR and layer list does get propagated to newly created nets 3. Clock Specific NDR from set_clock_routing_rules • Supports separate NDRs and layer list for root, leaf, and internal nets, which are clock specific 4. Global NDR from set_clock_routing_rules • © 2016 Synopsys, Inc. 13 Supports separate NDRs and layer list for root, leaf, and internal nets Setting Up Design for CTS Setting Up the Nondefault Rule (NDR) and Layer Constraints • IC Compiler II CTS supports 3 levels of NDRs and layer list: –Root NDR: – Applied on the root clock net till the first point of branching or till the first preexisting cell on the clock tree, whichever comes first –Sink NDR: – Applied on nets connected to the clock pins of flops –Internal NDR: – Applied on nets other than root and sink Case 1 Root NDR till the first preexisting g cell on clock tree © 2016 Synopsys, Inc. 14 Post-CTS Pre-CTS Root Internal Leaf Setting Up Design for CTS NDR and Layer List Examples Case 2 Post-CTS Pre-CTS Root Root NDR till the first point of branching g Case 3 Leaf Post-CTS Pre-CTS Root NDR till the first point of branching g © 2016 Synopsys, Inc. 15 Internal Root CTS inserted buffer Internal Leaf Setting Up Design for CTS Setting the Reference for Clock Tree Building and Optimization • CTS commands use library cells with valid_purposes=cts and dont_touch=false attribute settings set_lib_cell_purpose –include cts ${lib_cells} • You need to ensure all cell types (such as ICGs, MUX) on clock tree have logically equivalent library cells specified as IC Compiler II CTS for sizing optimization. –If logically equivalent library cells with valid_purposes=cts are not present, CTS does not size the cell instances and QoR might get affected –To find the LEQ cells of preexisting cells on clock tree and have it in the reference list, use the derive_clock_cell_references command © 2016 Synopsys, Inc. 16 Setting Up Design for CTS Defining Clock Tree Exceptions: Balance Point • Balance point is a stop or float exception that defines the endpoint to be considered as a valid balance endpoint for clock tree synthesis • You can either define the pin as sink pin with 0 phase delay or float pin with required insertion delay requirement • By default, balance point applies to all the corners of current mode • When a corner is provided with the –corner option, the constraint applies to given corner and current mode Setting Details set_clock_balance_points \ –balance_points reg1/CK \ –delay 0.1 –corners corner1 Defines the float delay exception of 0.1 on register clock input pin in corner “corner1” of current mode set_clock_balance_points \ –balance_points buf1/A The input of the buffer is considered as stop point for clock tree synthesis for current mode Apply the balance point delay in primary corner. During CTO, the tool performs scaling and uses the scaled delay in other corners © 2016 Synopsys, Inc. 17 Setting Up Design for CTS Primary Corner for CTS • In MCMM designs, during CTS, the tool chooses the primary corner for clock tree building – Primary corner is selected based on corner with worst wire delay, gate delay, and balance point – As primary corner selection also depends on balance point delay, always define the balance point delay for each mode on associated worst corner with –corner option – Optionally, you can also control the primary corner selection by using the cts.compile.primary_corner application option icc2_shell> set_app_options ‐name cts.compile.primary_corner \ ‐value corner_worst © 2016 Synopsys, Inc. 18 Setting Up Design for CTS Automatic Scaling of Balance Point During Clock Tree Optimization (CTO) • During clock tree optimization, the tool automatically scales the balance point delay from the primary corner to all other corner of the same mode Total number of global routed clock nets: 28 Information: The run time for clock net global routing is 0 hr : 0 min : 0.86 sec, cpu time is 0 hr : 0 min : 0.86 sec. (CTS‐104) … Information: Balance point auto scaling set on term ‘reg1/CK' clock 'clk' corner 'func_C2' early rise 0.104088 early fall 0.104088 late rise 0.104088 late fall 0.104088. (CTS‐040) icc2_shell> set_clock_balance_points \ –balance_points reg1/CK \ –delay 0.1 –corners C1 Scenario Mode Corner Balance point delay considered by tool for reg1/CK S1 Func1 C1 0.1 S2 Func1 C2 CTO automatically scales 0.1, and use that for optimization S3 Func2 C1 No balance point as the mode is different However, if the mode is same as func1, you must use the copy feature to copy the exceptions © 2016 Synopsys, Inc. 19 Setting Up Design for CTS Balance Point Delay Copying and Scaling • To copy the exception across the similar modes, set the clock tree options as follows: set_clock_balance_points \ –balance_points reg1/CK \ –delay 0.1 –corners C1\ set_clock_tree_options ‐copy_exceptions_across_modes \ ‐from_mode func1 –to_mode {func2 func3…} – If there are user defined balance points in the mode specified by the –to_mode option, they are not overwritten – Ignore exceptions are not copied across modes Scenario Mode Corner Balance point delay considered by tool for reg1/CK S1 func1 C1 0.1 S2 func1 C2 CTO automatically scales 0.1, and use that for optimization S3 func2 C2 The tool copies and scales the exception to the C2 corner and uses it in the S3 scenario for optimization © 2016 Synopsys, Inc. 20 Synopsys Confidential Setting Up Design for CTS Defining Clock Tree Exceptions: Exclude Pins • Ignore or exclude exception – These exceptions exclude clock endpoints from being optimized for skew and latency – Exclude exception is a mode specific constraint and applies to the current mode Setting Details set_clock_balance_points \ –balance_points reg1/CK \ –consider_for_balancing false The clock pin of the register is considered as ignore pin for clock tree synthesis and optimization. For these pins only DRC fixing is done © 2016 Synopsys, Inc. 21 Setting Up Design for CTS Defining Clock Tree Exceptions: Don’t Touch and Size Only • Setting don’t touch on subtree icc2_shell> set_dont_touch_network ‐clock_only [get_pins pin_name] • Setting don’t touch on net and instance icc2_shell> set_dont_touch [get_cells cell_name] true icc2_shell> set_dont_touch [get_nets ‐segments net_name] true • Preventing the router from touching a net icc2_shell> set_attribute [get_nets $clk_net] physical_status locked • Setting size only icc2_shell> set_size_only [get_cells cell_name] true • Setting physical status icc2_shell> set_placement_status fixed [get_cells cell_name] © 2016 Synopsys, Inc. 22 Setting Up Design for CTS Defining the Clock Balance Group for Inter Clock Delay Balancing (ICDB) • Inter clock balance groups specifies the group of clocks which needs to be balanced together for the timing • You can automatically generate the balance constraint based on the timing relation using derive_clock_balance_constraints command – Use the create_clock_balance_group command for the customized balance requirements • By default, the clock_opt –from build_clock command runs ICDB. • To run standalone ICDB, you can use the balance_clock_groups command. © 2016 Synopsys, Inc. 23 Setting Up Design for CTS Handling of Clock Tree Exceptions During Clock Tree Synthesis : Cell Specific Allow Sizing Allow Movement Allow buffer removal Allow splitting Allow merging with merge_clock_gates or MSCTS physical_status fixed or locked No No No No No physical_status legalize_only Yes Only to legalize No No No No Yes No No No Yes Yes No No, if user_size_only Yes, if derived_size_only No, if user_size_only Yes, if derived_size_only Exception dont_touch size_only © 2016 Synopsys, Inc. 24 Setting Up Design for CTS: Clock Skew Groups • Designs might have requirements to balance few clock pins separately from rest of clock pins • Clock skew groups provide a way to meet this requirement Enable path e n CLK CLK ICG gclk1 CLK Group1 gclk2 gclk3 Group1 CLK Group2 No timing paths between group1 & group2 Need minimum latency for Group1 registers Skew groups are mode specific constraints. Ensure that group is created in all modes © 2016 Synopsys, Inc. 25 Confidential Setting Up Design for CTS Clock Skew Group Example • Create the clock skew group as follows, and then run CTS # create clock skew groups • Skew groups are mode specific. Create it for each mode foreach_in_collection m [get_modes] { current_mode $m create_clock_skew_group ‐name sg1 ‐objects {sink1/CP sink2/CP} create_clock_skew_group ‐name sg2 ‐objects {sink3/CP sink4/CP} report_clock_skew_groups } clock_opt –from build_clock –to route_clock © 2016 Synopsys, Inc. 26 Setting Up Design for CTS Summary of Constraints CTS Configuration Command/app option Recommendation Maximum transition set_max_transition –clock_path Define for each scenario. Maximum capacitance set_max_capacitance –clock_path Define for each scenario. Target skew set_clock_tree_options \ –target_skew Define for each corner Target Latency set_clock_tree_options \ ‐target_latency Define for each corner Routing rule and layer set_clock_routing_rules Define layer and NDR together Balance point (float and exclude) set_clock_balance_points Define the balance point delay for each mode on associated worst corner with the –corner option Exceptions (sizing and don’t touch) set_dont_touch set_size_only Define the exception only on required cells and nets References set_dont_touch $cts_ref false set_lib_cell_purpose –include cts $cts_ref Ensure that non-repeaters such as ICGs, MUXs, and so on are also included in the reference list Clock balance group for ICDB derive_clock_balance_constraints/ create_clock_balance_group © 2016 Synopsys, Inc. 27 Setting Up Design for CTS Example Setup Script Clear any unintended NDR on clock network before CTS and apply correct rule Clear don’t touch on clock network before CTS Ensure that there are no dont_touch attribute on references p provided for CTS Define the DRC constraint for all the scenarios Define the target skew/latency for all the scenario Define the balance point delay for each mode on p associated worst corner with –corner option © 2016 Synopsys, Inc. 28 Setting Up Design For CTS: Pre-CTS Sanity Check: report_clock_settings Review the settings using the report_clock_settings command © 2016 Synopsys, Inc. 29 Setting Up Design For CTS: Pre-CTS Sanity Check: report_clock_routing_rules • © 2016 Synopsys, Inc. 30 Ensure that routing rule is always defined with layer list. If layers are not reported, set the rule again with correct layer list Setting Up Design for CTS Pre-CTS Sanity Check: check_clock_trees © 2016 Synopsys, Inc. 31 Confidential • Summary report provides improved clarity • Classification of checks into multiple categories to provide a quick overview • Detailed information about each warning for faster debugging • Tabular form of reporting for improved clarity Setting Up Design For CTS: Pre-CTS sanity check: check_legality When there are illegally placed cells in the input design, it might lead to runtime and bad QOR © 2016 Synopsys, Inc. 32 Setting Up for CTS Scenario setup NDR and layer list DRC constraints Reference setting Balance point definition ICDB constraints Skew group Pre-cts sanity check © 2016 Synopsys, Inc. 33 Quiz 1 • Which layers are used by CTS for root, internal, and sink nets for following NDR specification? icc2_shell> set_clock_routing_rules ‐net_type all \ –rules 2w2s –max_routing_layer M6 –min_routing_layer M4 icc2_shell> set_clock_routing_rules ‐net_type internal ‐rule 2w2s icc2_shell> set_clock_routing_rules ‐net_type root ‐rule 2w2s icc2_shell> set_clock_routing_rules ‐net_type sink ‐rule 1w2s © 2016 Synopsys, Inc. 34 Quiz1: Answer • Because layers are not specified for root, internal, and sink nets types,the tool uses all the available layers for routing. Always specify the layer list and NDR together for each net type Correct Usage: icc2_shell> set_clock_routing_rules ‐net_type internal \ ‐rule 2w2s –max_routing_layer M6 \ –min_routing_layer M5 icc2_shell> set_clock_routing_rules ‐net_type root ‐rule 2w2s \ ‐max_routing_layer M6 –min_routing_layer M5 icc2_shell> set_clock_routing_rules ‐net_type sink ‐rule 1w2s \ ‐max_routing_layer M6 –min_routing_layer M5 © 2016 Synopsys, Inc. 35 Contents • Introduction • Ensuring the Design is Ready for Clock Tree Synthesis • Setting Up for Clock Tree Synthesis • Performing Clock Tree Synthesis and Optimization • Analyzing the Results © 2016 Synopsys, Inc. 36 Clock Tree Synthesis and Optimization CTS Flow overview Read in placed block Perform CTS setup Perform CTS using clock_opt –to build_clock command, during which the tool performs: • Removal of clock tree • Auto exception generation • Gate by gate clock tree building • DRC fixing beyond exception • Global routing of clock nets • Pre-opt DRC fixing • Skew and latency optimization • Post opt DRC fixing • Clock tree summary • ICDB Analyze the CTS results Route clock nets © 2016 Synopsys, Inc. 37 Perform post-CTS and postroute optimization Clock Tree Synthesis and Optimization Tool Derived Automatic Exceptions for Ease of Use Cases CTS balancing conflict Tool derived automatic exception Case 1 Conflict between two clocks due to missing generated clock Derive exclude exception for the clock with missing generated clock Case 2 Internal sink pins of a cell with clock output pins Derive exclude exception on internal sink pins Case 3 Large ETM internal skew coming from the max/min clock tree path Derive balance point with maximum delay at the ETM input pin Case 4 Corners with missing balance points Scales the user defined delays from other corners Unfixable skew Exclude Balance point Exclude ETM clk Case 1 © 2016 Synopsys, Inc. 38 Case 2 Confidential Case 3 Clock Tree Synthesis and Optimization Clock Tree Synthesis: Analyzing Log File ************************************************************ * CTS STEP: Design Initialization ************************************************************ Information: All clock objects will be converted from ideal to propagated clock during CTS. (CTS‐105) Information: CTS will work on the following scenarios. (CTS‐101) • Structured log file where each step s1 (Mode: func1; Corner: func_BEST) is separated by a header. s2 (Mode: func1; Corner: func_WORST) Information: CTS will work on all clocks in active scenarios, including 1 master clocks and 0 generated• clocks. Under(CTS‐107) each step, it provides more information about settings used, details of the operation performed, summary, and more. CTS related app options set by user: cts.common.verbose = 1 Buffer/Inverter reference list for clock tree synthesis: stdcell_lib/BUFX10 CTS NDR rule list: Design Base; Net Type: root; Rule: 2w2s; Min Layer: M6; Max Layer: M7 Design Base; Net Type: internal; Rule: 2w2s; Min Layer: M6; Max Layer: M7 Design Base; Net Type: sink; Rule: 1w2s; Min Layer: M4; Max Layer: M7 ************************************************************ * CTS STEP: Existing Clock Tree Removal ************************************************************ A total of 1 buffer(s) and 0 inverter(s) have been removed. ************************************************************ * CTS STEP: Clock Tree Initialization ************************************************************ Start Auto‐Exception Derivations... No internal pin was found. No conflict pin was found. No macro pin was found for clock balance point settings. © 2016 Synopsys, Inc. 39 Synopsys Confidential Clock Tree Synthesis and Optimization Clock Tree Synthesis: Analyzing Log File ************************************************************ * CTS STEP: Clock Cell Relocation ************************************************************ Inst 'mux1' is not movable Information: Relocated the clock cell 'icg3' from (791.37, 352.30) to (188.19, 87.10). (CTS‐106) Information: Relocated the clock cell 'icg2' from (206.15, 373.10) to (358.97, 864.50). (CTS‐106) At each gate level, it provides the Inst 'b2' is not movable Information: Relocated the clock cell 'icg1' from (795.42, 744.90) to (989.96, 370.50). (CTS‐106) details about driver name, clocks A total of 3 clock cells have been relocated considered for synthesis, number of ************************************************************ * CTS STEP: Gate‐By‐Gate Clock Tree Synthesis loads, DRC constraint, and more. ************************************************************ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ Gate level 2 clock tree synthesis driving pin = mux1/Z Clocks: clk (func1) Design rule constraints: max transition = 0.100000 max capacitance = 0.600000 Number of load sinks = 1 Number of ignore points = 0 Warning: Gate 'mux1' is not sizable because of lib_cell dont_touch. (CTS‐041) ..... Added 5 Repeaters. Built 5 Repeater Levels Phase delay: mux1/D0 : (lp max: 0.194 sp min: 0.194) : skew = 0.000 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ … … Information: The run time for gate‐by‐gate clock tree synthesis is 0 hr : 0 min : 1.35 sec, cpu time is 0 hr : 0 min : 1.35 sec. (CTS‐ 104) © 2016 Synopsys, Inc. 40 Synopsys Confidential Clock Tree Synthesis and Optimization Clock Tree Synthesis: Analyzing Log File ************************************************************ * CTS STEP: DRC Fixing Beyond Exceptions ************************************************************ … • User friendly log reporting, which prints the QoR information for each clock, corner, and mode at the start and end of each optimization step. • See the CTS-037 message for details. ************************************************************ * CTS STEP: Clock Net Global Routing ************************************************************ Total number of global routed clock nets: 28 Information: The run time for clock net global routing is 0 hr : 0 min : 0.86 sec, cpu time is 0 hr : 0 min : 0.86 sec. (CTS‐104) … Information: Balance point auto scaling set on term 'sink3/CP' clock 'clk' corner ‘’func_WORST' early rise 0.104088 early fall 0.104088 late rise 0.104088 late fall 0.104088. (CTS‐040) … ************************************************************ * CTS STEP: Pre‐Optimization DRC Fixing ************************************************************ Information: CTS QoR Pre Initial DRC Fixing: GlobalSkew = 0.0068; ID = 0.4446; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Pre Initial DRC Fixing: GlobalSkew = 0.0096; ID = 0.4640; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) Resized 0 cell(s), relocated 0 cell(s), cloned 0 cell(s) and inserted 0 buffer(s)/inverter(s) Information: CTS QoR Post Initial DRC Fixing: GlobalSkew = 0.0068; ID = 0.4446; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = fun_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Post Initial DRC Fixing: GlobalSkew = 0.0096; ID = 0.4640; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; Grep for CTS-104 message to get the ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) runtime information for each step … … Information: The run time for pre‐optimization DRC fixing is 0 hr : 0 min : 0.00 sec, cpu time is 0 hr : 0 min : 0.00 sec. (CTS‐104) © 2016 Synopsys, Inc. 41 Synopsys Confidential Clock Tree Synthesis and Optimization Clock Tree Synthesis: Analyzing Log File • User friendly log reporting, which prints the QoR information for each clock, corner, and mode at the start and end of each optimization step. • See the CTS-037 message for details. ************************************************************ * CTS STEP: Skew Latency Optimization and Area Recovery ************************************************************ Information: CTS QoR Pre Optimization: GlobalSkew = 0.0068; ID = 0.4446; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Pre Optimization: GlobalSkew = 0.0096; ID = 0.4640; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 34.7490; ClockCellArea = 51.7725; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) Begin Network Flow Based Optimization: Default network flow optimizer made 27 successful improvements out of 58 iterations Resized 4, relocated 12, deleted 4, inserted 0, sizeUp Relocated 0 cells Information: CTS QoR Post Optimization: GlobalSkew = 0.0064; ID = 0.3539; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Post Optimization: GlobalSkew = 0.0024; ID = 0.3650; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) Begin Area Recovery Buffer Removal: AR: deleted 0 cell(s) Information: CTS QoR Post Area Recovery: GlobalSkew = 0.0064; ID = 0.3539; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Post Area Recovery: GlobalSkew = 0.0024; ID = 0.3650; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) Begin Area Recovery Resizing: AR: resized 0 out of 24 cell(s) Information: CTS QoR Post Area Recovery: GlobalSkew = 0.0064; ID = 0.3539; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Post Area Recovery: GlobalSkew = 0.0024; ID = 0.3650; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) © 2016 Synopsys, Inc. 42 Synopsys Confidential Clock Tree Synthesis and Optimization Clock Tree Synthesis: Analyzing Log File ************************************************************ * CTS STEP: Post‐Optimization DRC Fixing ************************************************************ Resized 0 cell(s), relocated 0 cell(s), cloned 0 cell(s) and inserted 0 buffer(s)/inverter(s) Information: CTS QoR Post Final DRC Fixing: GlobalSkew = 0.0064; ID = 0.3539; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_BEST; ClockRoot = clk1. (CTS‐037) Information: CTS QoR Post Final DRC Fixing: GlobalSkew = 0.0024; ID = 0.3650; NetsWithDRC = 0; Worst Tran/Cap cost = 0.0000/0.0000; ClockBufArea = 30.8880; ClockCellArea = 47.9115; Clock = clk; Mode = func1; Corner = func_WORST; ClockRoot = clk1. (CTS‐037) … ************************************************************ • At the end of CTS, the tool prints * CTS STEP: Postlude ************************************************************ the summary information, which Marking clock synthesized attributes includes information about clock Successfully legalize placement. ************************************************************ tree, skew bottleneck analysis * CTS STEP: Summary report report, and summary of errors and ************************************************************ There are 25 flat clock tree nets. warnings during CTS. There are 26 non‐sink instances (total area 51.07) on clock trees including 4 instances dont_touch. • Check this report as a first step Clock tree synthesize and optimization added 2 buffers and 18 inverters (total area 29.48). 3 buffers/inverters were inserted below 1 leaf level Gates. when analyzing and debugging the Skew Bottleneck Analysis: Largest skew jumps (> 0.1 or 50 percent of the global skew) of the terms for clock clk: Skewgroup: default_clk, Corner: func_BEST Skew jumped by 0.010 at term icg2/GCP Summary of messages during CTS: =============================================================== Tag Count Type Description ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ CTS‐041 4 Warning Gate '%s' is not sizable because of %s dont_touch. CTS‐043Inc. 431 Warning Gate '%s' is not sizable because of fixed placement © 2016 Synopsys, Synopsys Confidential CTS results Clock Structure: Pre-CTS I clk_out1 clk_out2 I E set_output_delay 2 clk_out2 \ –clock clk1 Setting: NDR Root: 2W3S Internal : 2W2S Sink: 1W2S CG I is_clock_used_as_data=true is_clock_used_as_clock=false gclk clk1 I0 clk2 I Z I1 set_case_analysis 0 SEL (I1 ‐> Z is disabled) B CG is_clock_used_as_clock= false is_clock_used_as_data=false I © 2016 Synopsys, Inc. 44 cip cts_size_in_place B I E Balance point Implicit Ignore Explicit Ignore Clock Structure: Pre-CTS Attribute Settings • All the pins on the clock network have the is_clock_used_as_clock attribute as true, except for the pins which goes to data paths and disabled arcs • CTS works only on the path with pins that have the is_clock_used_as_clock attribute as true © 2016 Synopsys, Inc. 45 Clock Structure: Post-CTS set_output_delay 2 clk_out2 \ clk_out2 –clock clk1 cip I *_bdp clk_out1 I *_bip E S CG Setting: NDR Root: root Internal : internal Sink: sink cip *_bdp cip CTS Inserted buffer cip clk1 I0 clk2 gclk set_case_analysis 0 SEL (I1 ‐> Z is disabled) I cip is_clock_used_as_data=true is_clock_used_as_clock=false I cip Z I1 B cip CG *_btd1 cip cip cip I *_bdp © 2016 Synopsys, Inc. 46 cip cts_size_in_place B I E Balance point Implicit Ignore Explicit Ignore Clock Structure: Post-CTS Attribute Settings • To separate the data path, the tool inserts guide buffer with the name *dp* and sets the is_clock_used_as_data attribute setting on its input to true • To separate the disabled path input, the tool inserts guide buffer with name *td* and sets the is_clock_used_as_clock and is_clock_used_as_data attribute of its input to false • All the cells inserted by CTS and pre-existing cells other than those identified as cip (in the figure in the previous page) have their physical status set as application_fixed © 2016 Synopsys, Inc. 47 Clock Tree Synthesis and Optimization Cell and Net Naming: CTS, CTO and ICDB Name Purpose Example cts_inv_<digit> or cts_buf_<digit> Inserted by CTS clustering cts_buf_4189301507 cts_inv_7655304984 cts_dlydt_<digit> Inserted by CTS for delay detour cts_dlydt_547 _*bip* or _*vip* (v for inverter) Inserted by CTS to separate ignore pin clk1_bip4 _*bdp* or _vdp* Inserted by CTS to separate clock to data path sink3_Q_bdp3 *_vtd* or *_btd* Inserted by CTS to separate disabled path clk2_btd1 cto_buf_drc_<digit> Inserted by CTO to fix the DRC cto_buf_drc_1181 cto_buf_<digit> Inserted by CTO for optimization cto_buf_1416 cto_dtrdly_<digigt> Inserted by CTO for detour cto_dtrdly_307365 buf_drc_cln<digit> or inv_drc_cln<digit> Cloned buffered during DRC fixing buf_drc_cln306737 cto_buf_cln_<digit> cto_inv_cln_<digit> Cloned buffer during CTO cto_buf_cln_306880 ICDB_<digit> Cell added during balance clock group ICDB_1457 CTS_MCSB_<digit> Cell added by CTS for multi fanout skew balancing CTS_MCSB_8830 © 2016 Synopsys, Inc. 48 To add a prefix to the cells, use the cts.common.user_instance_name_prefix application option Clock Tree Synthesis and Optimization Cell and Net Naming: Postroute CTO and CCD Name Purpose Example ctosc_inst_* Inserted by post route CTO for optimization ctosc_inst_519862 ctosc_drc_inst_* Inserted by post route CTO for DRC fixing ctosc_drc_inst_518456 ccd_setup_* Inserted by CCD for setup optimization ccd_setup_4189301507 ccd_hold_* Inserted by CCD for hold optimization ccd_hold_283737 © 2016 Synopsys, Inc. 49 Clock Tree Synthesis and Optimization Cell and Net Naming: MSCTS flow Name Purpose Example <old_cell_name>_split_<digit> Newly created cell by split_clock_cells command icg_split_12321 *clk_drv_r<row_num>c<column_num> Newly created cell by create_clock_drivers command clk_drv_r1c1 *msgts_l<num>_* Newly created cell by synthesize_multisource_global_clock_trees command. l<num> represents the level msgts_l3_d1s0_1 Naming convention for cells introduced during the synthesize_multisource_clock_taps and synthesize_multisource_clock_subtrees commands can be controlled by using following application options: cts.multisource.subtree_merge_concatenate_length_threshold cts.multisource.subtree_merge_*name* cts.multisource.subtree_split_*name* © 2016 Synopsys, Inc. 50 Contents • Introduction • Ensuring the Design is Ready for Clock Tree Synthesis • Setting Up for Clock Tree Synthesis • Performing Clock Tree Synthesis and Optimization • Analyzing the Results © 2016 Synopsys, Inc. 51 Analyzing the CTS Results Reports to Analyze the CTS QoR: report_clock_qor • • To report the clock tree QoR, use the report_clock_qor command It generates two type of report • Tabular reports Options available with tabular report • Histogram report Report Type Metrics summary (latency, skew, area, and DRC) latency drc_violators area Tabular Reports (‐type) robustness structure balance_group local_skew power © 2016 Synopsys, Inc. 52 Confidential Analyzing the CTS Results Reports to Analyze the CTS QoR: report_clock_qor Options available with tabular report Report Type Metrics latency level robustness wire_delay_fraction fanout capacitance Histogram Reports (‐histogram_type) transition wire_length cell_delay local_skew wire_arc_length wirelength © 2016 Synopsys, Inc. 53 Confidential report_clock_qor More Organized and Structured Reporting The clocks are organized by modes and reported across corners Indentation and the Attrs column show master clock, generated clock, and skew group relationships within a scenario © 2016 Synopsys, Inc. 54 Confidential report_clock_qor -show_verbose_paths Added Verbosity in Path Report Improves the Ability to Debug Point Fanout Cap Trans Incr Path Location NDR Layers CTS Tags ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ source latency 0.0000 0.0000 clk1 (in) 1 0.0258 0.0000 0.0000 0.0000 r (999.92,737.10) 2w2s {M7 113} {M6 174} clk cts_inv_34763893/A (INVX8) 0.0327 0.0166 0.0166 r (825.66,625.11) {M7 113} {M6 174} cts_inv_34763893/Z (INVX8) 2 0.0223 0.0210 0.0138 0.0304 f (825.72,624.72) 2w2s … cts_inv_34683885/A (INVX8) cts_inv_34683885/Z (INVX8) mux1/D1 (CLKMUX2X2) mux1/Z (CLKMUX2X2) 0.0157 0.0549 0.0242 0.0485 0.0337 0.0226 0.0177 0.0196 0.0491 0.1336 f 0.1512 r 0.1708 r 0.2199 r (310.64,1.11) (310.70,0.71) (10.80,10.98) (11.88,10.85) 0.0111 0.0180 0.2309 r 0.2489 f (214.38,16.24) (214.48,16.26) 2w2s 0.0093 0.0131 0.3169 f 0.3300 r (557.68,381.09) (557.75,381.48) 1w2s 2 0.0215 1 cts_inv_33553772/A (INVX4) cts_inv_33553772/Z (INVX4) 1 0.0149 0.0413 0.0268 cts_inv_33523769/A (INVX8) cts_inv_33523769/Z (INVX8) 1 0.0161 0.0305 0.0165 2w2s {M6 171} {M7 287} {M6 300} {M7 10} {M6 300} {M7 10} 2w2s {M6 202} {M6 202} dt_lib … {M6 46} {M7 230} {M5 229} {M4 47} sink4/CP (DFF1) 0.0299 0.0117 0.3418 r (605.61,610.54) {M5 229} {M4 47} sink ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ total clock latency 0.3418 Extra columns in the clock path reporting for NDRs, routing layers, location, and so on provides detailed information and is helpful for debugging © 2016 Synopsys, Inc. 55 Synopsys Confidential Querying Clock Tree with get_clock_tree_pins • get_clock_tree_pins returns a collection of clock tree pins –Lot of clock tree related attributes available for querying/filtering get_clock_tree_pins # Get clock tree pins by various criteria [‐clocks object_list] (List of clocks) [‐from list_of_pins] (Include pins on paths starting from these pins) [‐sort_by extended_attribute_expression] (Sort by this attribute ) [‐index_range start_index_end_index] (Return a subset of the sorted pins) [‐groups_from list_of_pins] (Return a list of collections, one for each pin listed here) [‐metrics expression_list] (One or more expressions to evaluate for each pin selected ) [‐assign_to_variable variable_name] (Put results in associative variable) [‐total variable_name] (Add up the known results and assign the result to this variable) © 2016 Synopsys, Inc. 56 Synopsys Confidential Querying Clock Tree with get_clock_tree_pins • Few attributes available in get_clock_tree_pins command Clock QoR: Pin type and property: Area and Power related: skew_max, downstream_delay_max downstream_delay_min latency_max latency_min is_generated_clock_source is_clock_source is_sink is_leaf is_float_pin is_explicit_ignore is_ignored is_cts_added is_on_repeater is_on_buffer is_on_inverter is_on_icg is_inside_etm subtree_num_repeaters_max subtree_total_area_max subtree_total_leakage_power_max subtree_total_internal_power_max subtree_total_switching_power_max cell_internal_power_max cell_leakage_power_max cell_switching_power_max cell_total_power_max net_switching_power_max toggle_rate_max Topological: height_max height_min depth_max depth_min is_reconvergent Refer to get_clock_tree_pins command man page for a full list of attributes © 2016 Synopsys, Inc. 57 Synopsys Confidential Analyzing the CTS Results Clock Tree Attributes in get_clock_tree_pins clk1 height_max=4 depth_max=0 height_max=3 depth_max=1 height_max=2 depth_max=2 height_max=1 depth_max=3 clk2 is_clock_source !is_cts_added is_on_buffer height_max=2 height_min=1 is_cts_added is_on_buffer is_on_icg height_max=0 depth_max=4 is_sink © 2016 Synopsys, Inc. 58 Analyzing the CTS Results Example Usage of the get_clock_tree_pins Command • Get top 10 sinks with largest latency: icc2_shel> get_clock_tree_pins –filter is_sink –sort_by latency_max –index_range 10 • Get all clock tree pins which are ignored for skew balancing: icc2_shel> get_clock_tree_pins –filter is_ignored • Get sinks inside ETM: icc2_shel> get_clock_tree_pins ‐fil "is_inside_etm && is_sink" • Collect and store all latency values into a Tcl array: icc2_shel> get_clock_tree_pins –clocks CLK –sort_by latency_max \ –assign_to_variable my_array • Get all downstream sinks beyond a pin (traverse through dividers and ICGs): icc2_shel> get_clock_tree_pins ‐filter is_sink ‐groups_from mux/Z © 2016 Synopsys, Inc. 59 clk Example-1 icg1 mux • Get total switching power of all leaf level clock nets icg3 icg2 get_clock_tree_pins ‐sort_by net_switching_power_max \ ‐filter “height_max == 0 && is_net_driver” \ ‐total var echo $var ==> prints “26”, which is the net switching power of nets driven by icg2, icg3, and 2 leaf inverters © 2016 Synopsys, Inc. 60 Synopsys Confidential clk Example-2 icg1 • Find gate levels with highest switching power: mux –Get subtrees with toggle_rate_max > 0.1 –Get the top 2 gate levels –Print number of repeaters inserted, and switching power icg3 icg2 get_clock_tree_pins ‐sort_by subtree_total_switching_power_max \ –filter "toggle_rate_max > 0.1 && is_net_driver && !is_on_repeater" \ –metrics subtree_num_repeaters_max,subtree_total_switching_power_max \ –assign var –index_range ‐2 returns {clk icg1/Z} © 2016 Synopsys, Inc. 61 echo $var(clk) ==> prints {5 114.88579} echo $var(icg1) ==> prints {8 80.59376} Synopsys Confidential get_clock_tree_pins Case Study: QoR Debugging Examples icc2_shell> get_clock_tree_pins ‐filter “skew_max>0.1&&height_max<2&&is_net_driver” {icg1/ECK} Transition violators: identify clock tree pins with • Transition greater than constraints Skew debug: identify clock tree pins with • Skew greater than 0.1 • Level less than 2 starting from the clock sinks • Is an output pin icc2_shell> get_nets –of_objects [get_clock_tree_pins ‐filter "transition_max > 0.05”] {cts0 tmp tmp2} © 2016 Synopsys, Inc. 62 Confidential Case Study 1 • Insufficient sharing in the clock path for clock and high buffer count • Observation: – Reviewing the clock settings shows that routing layers are not defined when setting the clock routing rule Original setting: icc2_shell> set_clock_routing_rules ‐clocks [get_clocks] –rule RULE1 Correct setting: icc2_shell> set_clock_routing_rules ‐clocks [get_clocks ] ‐rule RULE1 \ ‐min_routing_layer M7 ‐max_routing_layer M8 With the correct setting, path sharing and buffer count improves © 2016 Synopsys, Inc. 63 Quiz 2 • In what order does CTO optimize the overlapping clock trees? © 2016 Synopsys, Inc. 64 Quiz 2: Answer • In what order does CTO optimize the overlapping clock trees? It considers all the clocks with overlaps together during CTO. Therefore, clock QoR is not dependent on the order. © 2016 Synopsys, Inc. 65 Quiz 3 • I have provided CTS buffers and inverters for CTS reference list, but the synthesize_clock_trees command exists saying that no usable buffers or inverters available. What can be the possible reasons for this? © 2016 Synopsys, Inc. 66 Quiz 3: Answer • I have provided CTS buffers and inverters for CTS reference list, but the synthesize_clock_trees command exists saying that no usable buffers or inverters available. What can be the possible reasons for this? Library cells had dont_touch attribute. Clear these setting using the set_dont_touch [get_lib_cells] false command © 2016 Synopsys, Inc. 67 Thank You