2007 IC/CAD Contest Problem A1: Leakage Power Reduction by Multiple-VT Cells Swapping Source: Global Unichip Corp. 1. Introduction Multiple-VT approach is generally used in the chip design for achieving both timing and power request. High-VT cells are good for low leakage power but have slow timing, suitable for non-critical paths; Low-VT cells are good for timing but have bigger leakage power, suitable for critical paths; Standard-VT cells have normal leakage and timing. To meet both leakage and timing requirement of the chip design, there are several approachs have been proposed regarding the using of multiple-VT cells. One approach is doing timing optimization and leakage power optimization at the same time. It needs complex timing/power optimization engine. Another approach is doing timing optimization first to achieve the timing closure, and then swaps cells of non-critical path to reduce the leakage power. It needs a good algorithm to keep up the timing closure status while swapping cells. We will aim for the second approach in the contest. 2. Problem Description Given (1) a design (design.netlist) that is in gate-level netlist form, where the design has been implemented using standard-VT cells and has achieved the timing closure, there is no timing violations on the design, (2) timing constraint (timing.con), (3) nets load file (net.cap) that contains each net’s capacitance load, and (4) timing model of standard cells (svt.lib, hvt.lib, lvt.lib), the contestant must use multiple-VT cells to swap the original cells in order to reduce the designs leakage power. At the same time the program should analysis the timing to ensure there is no timing violation occurs after cells are swapped. For a cell type, there are 3 kinds of VT, they have the same cell area. There are several rules must be followed: a) The cells with same area and same function imply the same number of pins and their b) c) d) e) positions. Assume the design's physical net routing has been frozen, the pins location cannot be changed after cell swapping. So the contestants cannot use smaller cell or bigger cell to swap the original cell. A cell can only be swapped by another cell that has the same function and same cell area. Can not add cells into the design or remove cells from the design. Can not change net names and instance names in the design. After swapping, there should be no setup/hold timing violations occur. The program cannot call commercial EDA tools to complete the job. 1 3. Input The default time, capacitance, and leakage units are in nano-second (ns), pico-fara (pf), and nano-watt (nw), respectively. (1). Design file (design.netlist) The original design is in gate-level Verilog format. To simplify the file parser, the gate-level Verilog file has been translated into a simple format for attendee’s program to read it. The original Verilog files will also be given for reference. Attendee can use the Verilog files to correlate the delay calculation between PrimeTime and the developed program. The format of the design file is as follows. PINS Pin_name1 Direction Pin_name2 Direction … END PINS COMPONENTS Instance_Name1 Cell_Name1 Instance_Name2 Cell_Name1 … END COMPONENTS NET Net_Name1 Instance_Name1.pin1 Instance_Name2.pin1 …. Net_Name2 Instance_Name4.pin2 Instance_Name5.pin2 …. Net_Name3 Instance_Name6.pin2 Port_Name3 Net_Name4 Port_Name4 Instance_Name7.pin2 …. END NET Pin_name: the pin name of input, output, and in/out pins Direction: the direction of the corresponding pin. It can be IN, OUT, or INOUT. Instance_Name: the instance name of a placed cell. Cell_Name: the cell name of a cell in the netlist. Net_Name: the name of an interconnection. Instance_Name.pin: the pin name of a cell connected to the net. For example, u1/u10/F1.Y represents that the instance name is u1/u10/F1 and Y pin is connected to the net. The slash “/” stands for hierarchy divider. The sequence of specified instances must be the driving cell first and followed by driven cells. Port_Name: the port name of a primary input or a primary output. If a net is connected from a cell to a primary output pin, its format will be: Net_Name3 Instance_Name6.pin2 Port_Name3 If a net is connected from a primary input pin to a cell, its format will be: Net_Name4 Port_Name4 Instance_Name7.pin2 2 (2). Timing constraint file (timing.con) Timing constraint file specifies the clock cycle time, and the input delay and output delay of primary IOs. The format is defined as follows: Clock_cycle Input_delay Output_delay clock_name input_pin_name output_pin_name clock_cycle_time input_delay_time output_delay_time Clock_cycle, Input_delay, and Output_delay are reserved keywords. Input_delay and Output_delay are the external delays of an input pin and an output pin respectively, whose definitions are identical to that defined in Synopsys PrimeTime. Clock name is an input pin name. An example of the timing constraint file is as follows. Clock_cycle Input_delay Output_delay CLK data_in[0] add_out[5] 10 4.8 4.7 (3). Net load file (net.cap) The Net load file specifies the capacitance of each net in the design. The net load is used to calculate the driving cell’s delay. The format is defined as follows: #NET_NAME Net_name1 Net_name2 ...... CAPACITANCE capacitance_value1 capacitance_value2 The line begins with ‘#’ is a comment. Net_name: the name of the net in the design. capacitance_value: the load of the net, it’s a floating number. An example of the net load file is as follows: #NET_NAME Ustr_sum_0 Carry_20 ...... CAPACITANCE 0.0238 0.0529 3 (4). Timing model of standard cells (svt.lib, hvt.lib, lvt.lib) Three timing models are given, svt.lib is the model of standard-VT cells, hvt.lib is the model of high-VT cells, and lvt.lib is the model of low-VT cells. The given timing model is in Synopsys .lib format and contains timing information, input pin capacitance, and leakage power of the standard cells. It is a two-dimension table-look-up model. The delay timing is related to the input transition time and the output load. The setup/hold timing is related to the input transition time of two input pins. Below is an example of the timing model: ....... lu_table_template(delay_template_7x7) { variable_1 : input_net_transition; variable_2 : total_output_net_capacitance; index_1 ("1000, 1001, 1002, 1003, 1004, 1005, 1006"); index_2 ("1000, 1001, 1002, 1003, 1004, 1005, 1006"); } lu_table_template(setup_template_3x3) { variable_1 : related_pin_transition; variable_2 : constrained_pin_transition; index_1 ("1000, 1001, 1002"); index_2 ("1000, 1001, 1002"); } .......... Cell Leakage Power cell (BUFX1) { cell_leakage_power : 5.187 ; Input Capacitance .......... pin(A) { direction : input; capacitance : 0.003477; } pin(Y) { direction : output; capacitance : 0.0; function : "A"; timing() { related_pin : "A"; timing_sense : positive_unate; cell_rise(delay_template_7x7) { index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); values ( \ "0.059814, 0.148227, 0.221109, 0.410291, 0.672101, 1.021130, 1.355603", \ "0.071968, 0.160093, 0.233008, 0.422239, 0.684072, 1.033117, 1.367596", \ "0.092836, 0.183203, 0.256230, 0.445346, 0.707170, 1.056216, 1.390702", \ "0.102056, 0.197283, 0.269983, 0.459138, 0.721033, 1.069998, 1.404450", \ "0.099924, 0.201503, 0.275289, 0.464924, 0.726738, 1.075862, 1.410247", \ "0.089415, 0.197640, 0.272994, 0.465209, 0.727462, 1.076497, 1.411074", \ "0.071630, 0.186748, 0.263815, 0.459390, 0.724021, 1.073303, 1.407765"); } rise_transition(delay_template_7x7) { index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); values ( \ "0.030627, 0.181793, 0.313933, 0.657598, 1.133489, 1.768029, 2.376136", \ "0.033365, 0.181763, 0.313929, 0.657589, 1.133488, 1.768029, 2.376134", \ "0.039292, 0.182942, 0.314810, 0.657583, 1.133478, 1.768019, 2.376134", \ "0.047036, 0.185613, 0.315813, 0.658974, 1.134019, 1.768031, 2.376134", \ "0.056039, 0.192559, 0.321015, 0.660309, 1.135028, 1.768828, 2.376364", \ "0.064971, 0.201005, 0.328875, 0.666395, 1.136649, 1.769622, 2.377172", \ "0.073844, 0.211253, 0.337763, 0.676731, 1.142890, 1.771268, 2.377928"); } 4 cell_fall(delay_template_7x7) { index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); values ( \ "0.062397, 0.153531, 0.226701, 0.416510, 0.679066, 1.029075, 1.364481", \ "0.079646, 0.170871, 0.244063, 0.433911, 0.696486, 1.046504, 1.381914", \ "0.128636, 0.223286, 0.296756, 0.486452, 0.749012, 1.099029, 1.434439", \ "0.186025, 0.285599, 0.359099, 0.549339, 0.811847, 1.161779, 1.497154", \ "0.241396, 0.347741, 0.422915, 0.613536, 0.876430, 1.226407, 1.561703", \ "0.297722, 0.411006, 0.488564, 0.682375, 0.945420, 1.295674, 1.631027", \ "0.356177, 0.476429, 0.556420, 0.754852, 1.020361, 1.370541, 1.706061"); } fall_transition(delay_template_7x7) { index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3"); index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115"); values ( \ "0.028478, 0.168009, 0.290407, 0.609373, 1.051023, 1.639923, 2.204286", \ "0.029608, 0.168002, 0.290438, 0.609376, 1.051029, 1.639929, 2.204290", \ "0.036506, 0.170717, 0.291427, 0.609361, 1.051037, 1.639929, 2.204290", \ "0.045257, 0.174697, 0.293546, 0.611218, 1.051264, 1.639929, 2.204290", \ "0.054797, 0.184138, 0.300151, 0.612955, 1.053136, 1.640645, 2.204290", \ "0.063613, 0.195118, 0.311175, 0.619687, 1.054980, 1.642055, 2.205452", \ "0.073599, 0.207052, 0.323270, 0.633024, 1.061348, 1.643938, 2.206660"); } } ........ } ........... } cell (DFFX1) { cell_leakage_power : 6.410 ; ……………….. pin(D) { direction : input; capacitance : 0.0031; timing() { related_pin : "CP"; timing_type : setup_rising; rise_constraint(setup_template_3x3) { index_1 ("0.0192, 0.1096, 0.4184"); index_2 ("0.0192, 0.1096, 0.8304"); values("0.0527, 0.0704, 0.1161", \ "0.0331, 0.0508, 0.0964", \ "0.0172, 0.0329, 0.0747"); } fall_constraint(setup_template_3x3) { index_1 ("0.0192, 0.1096, 0.4184"); index_2 ("0.0192, 0.1096, 0.8304"); values("0.0391, 0.0606, 0.1961", \ "0.0194, 0.0410, 0.1745", \ "0.0101, 0.0115, 0.1450"); } } The lu_table_template defines the template of the look-up-table. This is a 2-dimension table. In above example, the delay_template_7x7 table’s first index denotes the input net transition time and the second index denotes the total output net capacitance. The output net capacitance includes net load and the input pin load of driven cells. Assume that a BUFX2 buffer drives three BUFX1 and its net load is 10pf. Then, the total output net capacitance is 10.010431pf (10pf + 3 x 0.003477pf). In the setup_template_3x3 table, the first index denotes the input net transition time of the clock pin and the second index denotes the input net transition time of the data pin. 5 The cell_leakage_power attribute defines cell’s leakage power. We assume the cell’s leakage power is independent of the circuit statues. In the example, the BUFX1 cell’s leakage power is 5.187nW. The cell_rise and cell_fall groups define cell’s rising delay and falling delay of the output pin that related to an input pin. In the example, when input net transition is 0.1ns and the output net capacitance is 0.0385pf, the cell_rise delay is 0.233008ns. The rise_transistion and fall_transistion groups define cell’s output pin transition time. In the example, when input net transition is 0.1ns and the output net capacitance is 0.0385pf, the rise_transition time is 0.313929ns. The output pin transition is used as the input net transition for the delay calculation of next stage. The rise_constraint and fall_constraint groups define cells’ input pin setup/hold timing constraint. In the example, when clock input net transition is 0.0192ns and the data input net transition is 0.1096ns, the setup time requirement for data pin rising is 0.0704ns. If the input net transition and/or the output net capacitance doesn’t exactly exist in the indexes, the software should use interpolation to calculate the timing. For the details of Synopsys timing model format, please refer to Synopsys Library Compiler Manual. 4. Output (1). Design file with optimized leakage power (design_opt.netlist) The format of the output design file should be identical to that of the input design file. All net name and instance name cannot be changed. The function of the design cannot be changed. It is not allowed to add new instances into the design or remove instances from the design. The timing could be changed, but it must meet the timing requirement that is based on the given timing constraint. The leakage power should be reduced in the optimized design. (2). Leakage power report (leakage.rpt) After optimized, the program should generate the design’s leakage power report. The format of the leakage power report is as follows. INITIAL_LEAKAGE OPTIMIZED_LEAKAGE REDUCED_LEAKAGE 5734.12nW 3246.91nW 2487.21nW INITIAL_LEAKAGE: The leakage power of the original design. OPTIMIZED_LEAKAGE: The leakage power of the optimized design. REDUCED_LEAKAGE = INITIAL_LEAKAGE - OPTIMIZED_LEAKAGE. 5. Delay Calculation 6 In this problem, we assume the resistance of interconnect is 0 ohm and thus there is no interconnect delay for all nets. The effects of net load, pin capacitance of driven cells should be taken into account when calculating the cell delay of the driving cell. The input capacitances of cells and the capacitance of nets are specified in cells timing model and the net load file, respectively. Assume that the input transition times of all primary inputs are 0ns and all primary output capacitance are 0pf. Cycle delay clock_period=10ns from CK to CK delay=t1+t2=10.1ns Data path delay e_hold = 0.1ns u2/F3 u1/u10/F2 Q D e_setup = 0.3ns Q D e_setup/e_hold CK CLK t1 s_clk = 1.9ns CK t2 e_clk = 2.1ns Fig 1. A timing path example Fig 1 is an example to show the timing path from the start point to the end point. Where t1 represents the delay from u1/u10/F2’s clock to its output, and t2 represents the delay consumed by the combinational circuit between u1/u10/F2 and u2/F3. The detail explanation of the timing path is as follows. start_point: a primary input port or a flip-flop clock pin. In the example, it is CK pin of u1/u10/F2. end_point: a flip-flop data pin or a primary output port. In the example, it is D pin of u2/F3. delay: the timing path delay from start_point to end_point. Between the start_point and the end_point, there may be multiple paths. For setup time check, the maximum delay path is used, while for hold time check, the minimum delay path is used. If the start_point is a primary input port, the external input delay is included in the delay value. If the end_point is a primary output port, the external output delay is included in the delay value. In the example, delay is t1 + t2. Launch clock latency: the clock latency from clock root to the start_point clock pin. If the start_point is a primary port, it is zero. In the example, it is s_clk. Capture clock latency: the clock latency from clock root to the end_point clock pin. If the 7 end_point is a primary port, it is zero. In the example, it is e_clk. Setup time: the setup time of the end_point. If the end_point is a primary output port, the setup time is zero. In the example, it is e_setup. Hold time: the hold time of the end_point. If the end_point is a primary output port, the hold time is zero. In the example, it is e_hold. slack: the timing slack . In the example, setup time slack = clock_period + e_clk – s_clk – delay – e_setup = -0.2ns (there is 0.2ns timing violation) hold time slack = s_clk - e_clk + delay – e_hold = 9.8ns (there is no timing violation) 6. Leakage Power Calculation The design’s leakage power is the summation of each cell instance’s leakage power. Leakage _ of _ cell Leakage _ of _ Design all _ cells 7. Language/Platform Language: C or C++ Platform: Solaris or Linux 8. Evaluation The total negative slack of setup/hold time violations The reduced leakage power Run time Memory usage The memory usage cannot exceed 1 Giga Bytes. The total negative slack is the summation of all negative setup and hold time slacks. The design’s leakage power should be as less as possible. The timing will be compared with the report from Synopsys PrimeTime, in which the output design (design_opt.netlist), the net load(net.cap) and the timing constraint(timing.con) will be read into PrimaryTime to analyze the timing to see if the timing meets the requirement. The formula to evaluate the result is as follows: P = (leakage_power_of_original_design - leakage_power_of_optimized_design) S = 0, if there is no timing violations. S = total_negative_slack, if there is timing violations. R = CPU run time. For a test case, the P, S, and R will be normalized to calculate the score. P will be normalized to get P’, 0<=P’<=100. S will be normalized to get S’, 0<=S’<=100. R will be normalized to get R’, 0<=R’<=30. But, basically we will terminate the test if the program runs over 12 hours on the AMD Opteron 2.2 GHz equivalent machine. 8 Score = P’ – S’ – R’ Total score = summary score of all test cases. We use P’ to explain the normalization. P Pmin P' 100 Pmax Pmin Here Pmax is the most saved leakage among all teams, Pmin is the lest saved leakage among all teams. Assume there are five teams A,B,C,D,E, and F. Their saved leakage is A:1000, B:800, C:600, D:400, Then P’ of team B is P' E:200, F:0. 800 0 100 80 1000 0 Below are some examples of the score calculation: Score = P’ – S’ – R’. For a given test case, assume the result saves most leakage, no timing violation, and has the fastest run time, then score = 100 – 0 – 0 = 100. If the result saves most leakage, no timing violation, but has the slowest run time, then Score = 100 – 0 – 30 = 70. If the result saves least leakage, no timing violation, and has the fastest run time, then Score = 0 – 0 – 0 = 0. If the result saves least leakage, has largest timing violation, and has the slowest run time, then the score = 0 – 100 –30 = -130. 9