2. Problem Description

advertisement
2007 IC/CAD Contest
Problem A1:
Leakage Power Reduction by Multiple-VT Cells Swapping
Source: Global Unichip Corp.
1. Introduction
Multiple-VT approach is generally used in the chip design for achieving both timing and
power request. High-VT cells are good for low leakage power but have slow timing, suitable
for non-critical paths; Low-VT cells are good for timing but have bigger leakage power,
suitable for critical paths; Standard-VT cells have normal leakage and timing.
To meet both leakage and timing requirement of the chip design, there are several
approachs have been proposed regarding the using of multiple-VT cells. One approach is
doing timing optimization and leakage power optimization at the same time. It needs complex
timing/power optimization engine. Another approach is doing timing optimization first to
achieve the timing closure, and then swaps cells of non-critical path to reduce the leakage
power. It needs a good algorithm to keep up the timing closure status while swapping cells.
We will aim for the second approach in the contest.
2. Problem Description
Given (1) a design (design.netlist) that is in gate-level netlist form, where the design has
been implemented using standard-VT cells and has achieved the timing closure, there is no
timing violations on the design, (2) timing constraint (timing.con), (3) nets load file (net.cap)
that contains each net’s capacitance load, and (4) timing model of standard cells (svt.lib,
hvt.lib, lvt.lib), the contestant must use multiple-VT cells to swap the original cells in order to
reduce the designs leakage power. At the same time the program should analysis the timing to
ensure there is no timing violation occurs after cells are swapped.
For a cell type,
there are 3 kinds of VT, they have the same cell area.
There are several rules must be followed:
a) The cells with same area and same function imply the same number of pins and their
b)
c)
d)
e)
positions. Assume the design's physical net routing has been frozen, the pins location
cannot be changed after cell swapping. So the contestants cannot use smaller cell or
bigger cell to swap the original cell. A cell can only be swapped by another cell that
has the same function and same cell area.
Can not add cells into the design or remove cells from the design.
Can not change net names and instance names in the design.
After swapping, there should be no setup/hold timing violations occur.
The program cannot call commercial EDA tools to complete the job.
1
3. Input
The default time, capacitance, and leakage units are in nano-second (ns), pico-fara (pf), and
nano-watt (nw), respectively.
(1). Design file (design.netlist)
The original design is in gate-level Verilog format. To simplify the file parser, the
gate-level Verilog file has been translated into a simple format for attendee’s program to read
it. The original Verilog files will also be given for reference. Attendee can use the Verilog files
to correlate the delay calculation between PrimeTime and the developed program.
The format of the design file is as follows.
PINS
Pin_name1 Direction
Pin_name2 Direction
…
END PINS
COMPONENTS
Instance_Name1 Cell_Name1
Instance_Name2 Cell_Name1
…
END COMPONENTS
NET
Net_Name1 Instance_Name1.pin1 Instance_Name2.pin1 ….
Net_Name2 Instance_Name4.pin2 Instance_Name5.pin2 ….
Net_Name3 Instance_Name6.pin2 Port_Name3
Net_Name4 Port_Name4 Instance_Name7.pin2
….
END NET
Pin_name: the pin name of input, output, and in/out pins
Direction: the direction of the corresponding pin. It can be IN, OUT, or INOUT.
Instance_Name: the instance name of a placed cell.
Cell_Name: the cell name of a cell in the netlist.
Net_Name: the name of an interconnection.
Instance_Name.pin: the pin name of a cell connected to the net. For example, u1/u10/F1.Y
represents that the instance name is u1/u10/F1 and Y pin is connected to the net. The
slash “/” stands for hierarchy divider. The sequence of specified instances must be the
driving cell first and followed by driven cells.
Port_Name: the port name of a primary input or a primary output. If a net is connected from a
cell to a primary output pin, its format will be:
Net_Name3 Instance_Name6.pin2 Port_Name3
If a net is connected from a primary input pin to a cell, its format will be:
Net_Name4 Port_Name4 Instance_Name7.pin2
2
(2). Timing constraint file (timing.con)
Timing constraint file specifies the clock cycle time, and the input delay and output delay
of primary IOs. The format is defined as follows:
Clock_cycle
Input_delay
Output_delay
clock_name
input_pin_name
output_pin_name
clock_cycle_time
input_delay_time
output_delay_time
Clock_cycle, Input_delay, and Output_delay are reserved keywords. Input_delay and
Output_delay are the external delays of an input pin and an output pin respectively, whose
definitions are identical to that defined in Synopsys PrimeTime. Clock name is an input pin
name.
An example of the timing constraint file is as follows.
Clock_cycle
Input_delay
Output_delay
CLK
data_in[0]
add_out[5]
10
4.8
4.7
(3). Net load file (net.cap)
The Net load file specifies the capacitance of each net in the design. The net load is used
to calculate the driving cell’s delay. The format is defined as follows:
#NET_NAME
Net_name1
Net_name2
......
CAPACITANCE
capacitance_value1
capacitance_value2
The line begins with ‘#’ is a comment.
Net_name: the name of the net in the design.
capacitance_value: the load of the net, it’s a floating number.
An example of the net load file is as follows:
#NET_NAME
Ustr_sum_0
Carry_20
......
CAPACITANCE
0.0238
0.0529
3
(4). Timing model of standard cells (svt.lib, hvt.lib, lvt.lib)
Three timing models are given, svt.lib is the model of standard-VT cells, hvt.lib is the
model of high-VT cells, and lvt.lib is the model of low-VT cells.
The given timing model is in Synopsys .lib format and contains timing information,
input pin capacitance, and leakage power of the standard cells. It is a two-dimension
table-look-up model. The delay timing is related to the input transition time and the output
load. The setup/hold timing is related to the input transition time of two input pins. Below is
an example of the timing model:
.......
lu_table_template(delay_template_7x7) {
variable_1 : input_net_transition;
variable_2 : total_output_net_capacitance;
index_1 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
index_2 ("1000, 1001, 1002, 1003, 1004, 1005, 1006");
}
lu_table_template(setup_template_3x3) {
variable_1 : related_pin_transition;
variable_2 : constrained_pin_transition;
index_1 ("1000, 1001, 1002");
index_2 ("1000, 1001, 1002");
}
..........
Cell Leakage Power
cell (BUFX1) {
cell_leakage_power : 5.187 ;
Input Capacitance
..........
pin(A) {
direction : input;
capacitance : 0.003477;
}
pin(Y) {
direction : output;
capacitance : 0.0;
function : "A";
timing() {
related_pin : "A";
timing_sense : positive_unate;
cell_rise(delay_template_7x7) {
index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3");
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
values ( \
"0.059814, 0.148227, 0.221109, 0.410291, 0.672101, 1.021130, 1.355603", \
"0.071968, 0.160093, 0.233008, 0.422239, 0.684072, 1.033117, 1.367596", \
"0.092836, 0.183203, 0.256230, 0.445346, 0.707170, 1.056216, 1.390702", \
"0.102056, 0.197283, 0.269983, 0.459138, 0.721033, 1.069998, 1.404450", \
"0.099924, 0.201503, 0.275289, 0.464924, 0.726738, 1.075862, 1.410247", \
"0.089415, 0.197640, 0.272994, 0.465209, 0.727462, 1.076497, 1.411074", \
"0.071630, 0.186748, 0.263815, 0.459390, 0.724021, 1.073303, 1.407765");
}
rise_transition(delay_template_7x7) {
index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3");
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
values ( \
"0.030627, 0.181793, 0.313933, 0.657598, 1.133489, 1.768029, 2.376136", \
"0.033365, 0.181763, 0.313929, 0.657589, 1.133488, 1.768029, 2.376134", \
"0.039292, 0.182942, 0.314810, 0.657583, 1.133478, 1.768019, 2.376134", \
"0.047036, 0.185613, 0.315813, 0.658974, 1.134019, 1.768031, 2.376134", \
"0.056039, 0.192559, 0.321015, 0.660309, 1.135028, 1.768828, 2.376364", \
"0.064971, 0.201005, 0.328875, 0.666395, 1.136649, 1.769622, 2.377172", \
"0.073844, 0.211253, 0.337763, 0.676731, 1.142890, 1.771268, 2.377928");
}
4
cell_fall(delay_template_7x7) {
index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3");
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
values ( \
"0.062397, 0.153531, 0.226701, 0.416510, 0.679066, 1.029075, 1.364481", \
"0.079646, 0.170871, 0.244063, 0.433911, 0.696486, 1.046504, 1.381914", \
"0.128636, 0.223286, 0.296756, 0.486452, 0.749012, 1.099029, 1.434439", \
"0.186025, 0.285599, 0.359099, 0.549339, 0.811847, 1.161779, 1.497154", \
"0.241396, 0.347741, 0.422915, 0.613536, 0.876430, 1.226407, 1.561703", \
"0.297722, 0.411006, 0.488564, 0.682375, 0.945420, 1.295674, 1.631027", \
"0.356177, 0.476429, 0.556420, 0.754852, 1.020361, 1.370541, 1.706061");
}
fall_transition(delay_template_7x7) {
index_1 ("0.03, 0.1, 0.4, 0.9, 1.5, 2.2, 3");
index_2 ("0.00035, 0.021, 0.0385, 0.084, 0.147, 0.231, 0.3115");
values ( \
"0.028478, 0.168009, 0.290407, 0.609373, 1.051023, 1.639923, 2.204286", \
"0.029608, 0.168002, 0.290438, 0.609376, 1.051029, 1.639929, 2.204290", \
"0.036506, 0.170717, 0.291427, 0.609361, 1.051037, 1.639929, 2.204290", \
"0.045257, 0.174697, 0.293546, 0.611218, 1.051264, 1.639929, 2.204290", \
"0.054797, 0.184138, 0.300151, 0.612955, 1.053136, 1.640645, 2.204290", \
"0.063613, 0.195118, 0.311175, 0.619687, 1.054980, 1.642055, 2.205452", \
"0.073599, 0.207052, 0.323270, 0.633024, 1.061348, 1.643938, 2.206660");
}
}
........
}
...........
}
cell (DFFX1) {
cell_leakage_power : 6.410 ;
………………..
pin(D) {
direction : input;
capacitance : 0.0031;
timing() {
related_pin : "CP";
timing_type : setup_rising;
rise_constraint(setup_template_3x3) {
index_1 ("0.0192, 0.1096, 0.4184");
index_2 ("0.0192, 0.1096, 0.8304");
values("0.0527, 0.0704, 0.1161", \
"0.0331, 0.0508, 0.0964", \
"0.0172, 0.0329, 0.0747");
}
fall_constraint(setup_template_3x3) {
index_1 ("0.0192, 0.1096, 0.4184");
index_2 ("0.0192, 0.1096, 0.8304");
values("0.0391, 0.0606, 0.1961", \
"0.0194, 0.0410, 0.1745", \
"0.0101, 0.0115, 0.1450");
}
}
The lu_table_template defines the template of the look-up-table. This is a 2-dimension
table. In above example, the delay_template_7x7 table’s first index denotes the input net
transition time and the second index denotes the total output net capacitance. The output net
capacitance includes net load and the input pin load of driven cells. Assume that a BUFX2
buffer drives three BUFX1 and its net load is 10pf. Then, the total output net capacitance is
10.010431pf (10pf + 3 x 0.003477pf). In the setup_template_3x3 table, the first index denotes
the input net transition time of the clock pin and the second index denotes the input net
transition time of the data pin.
5
The cell_leakage_power attribute defines cell’s leakage power. We assume the cell’s
leakage power is independent of the circuit statues. In the example, the BUFX1 cell’s leakage
power is 5.187nW.
The cell_rise and cell_fall groups define cell’s rising delay and falling delay of the output
pin that related to an input pin. In the example, when input net transition is 0.1ns and the
output net capacitance is 0.0385pf, the cell_rise delay is 0.233008ns.
The rise_transistion and fall_transistion groups define cell’s output pin transition time. In
the example, when input net transition is 0.1ns and the output net capacitance is 0.0385pf, the
rise_transition time is 0.313929ns. The output pin transition is used as the input net transition
for the delay calculation of next stage.
The rise_constraint and fall_constraint groups define cells’ input pin setup/hold timing
constraint. In the example, when clock input net transition is 0.0192ns and the data input net
transition is 0.1096ns, the setup time requirement for data pin rising is 0.0704ns.
If the input net transition and/or the output net capacitance doesn’t exactly exist in the
indexes, the software should use interpolation to calculate the timing.
For the details of Synopsys timing model format, please refer to Synopsys Library
Compiler Manual.
4. Output
(1). Design file with optimized leakage power (design_opt.netlist)
The format of the output design file should be identical to that of the input design file.
All net name and instance name cannot be changed. The function of the design cannot be
changed. It is not allowed to add new instances into the design or remove instances from the
design. The timing could be changed, but it must meet the timing requirement that is based on
the given timing constraint. The leakage power should be reduced in the optimized design.
(2). Leakage power report (leakage.rpt)
After optimized, the program should generate the design’s leakage power report. The
format of the leakage power report is as follows.
INITIAL_LEAKAGE
OPTIMIZED_LEAKAGE
REDUCED_LEAKAGE
5734.12nW
3246.91nW
2487.21nW
INITIAL_LEAKAGE: The leakage power of the original design.
OPTIMIZED_LEAKAGE: The leakage power of the optimized design.
REDUCED_LEAKAGE = INITIAL_LEAKAGE - OPTIMIZED_LEAKAGE.
5. Delay Calculation
6
In this problem, we assume the resistance of interconnect is 0 ohm and thus there is no
interconnect delay for all nets. The effects of net load, pin capacitance of driven cells should
be taken into account when calculating the cell delay of the driving cell. The input
capacitances of cells and the capacitance of nets are specified in cells timing model and the
net load file, respectively.
Assume that the input transition times of all primary inputs are 0ns and all primary
output capacitance are 0pf.
Cycle delay
clock_period=10ns
from CK to CK
delay=t1+t2=10.1ns
Data path delay
e_hold = 0.1ns
u2/F3
u1/u10/F2
Q
D
e_setup = 0.3ns
Q
D
e_setup/e_hold
CK
CLK
t1
s_clk = 1.9ns
CK
t2
e_clk = 2.1ns
Fig 1. A timing path example
Fig 1 is an example to show the timing path from the start point to the end point. Where
t1 represents the delay from u1/u10/F2’s clock to its output, and t2 represents the delay
consumed by the combinational circuit between u1/u10/F2 and u2/F3. The detail explanation
of the timing path is as follows.
start_point: a primary input port or a flip-flop clock pin. In the example, it is CK pin of
u1/u10/F2.
end_point: a flip-flop data pin or a primary output port. In the example, it is D pin of u2/F3.
delay: the timing path delay from start_point to end_point. Between the start_point and the
end_point, there may be multiple paths. For setup time check, the maximum delay path is
used, while for hold time check, the minimum delay path is used. If the start_point is a
primary input port, the external input delay is included in the delay value. If the end_point is a
primary output port, the external output delay is included in the delay value. In the example,
delay is t1 + t2.
Launch clock latency: the clock latency from clock root to the start_point clock pin. If the
start_point is a primary port, it is zero. In the example, it is s_clk.
Capture clock latency: the clock latency from clock root to the end_point clock pin. If the
7
end_point is a primary port, it is zero. In the example, it is e_clk.
Setup time: the setup time of the end_point. If the end_point is a primary output port, the
setup time is zero. In the example, it is e_setup.
Hold time: the hold time of the end_point. If the end_point is a primary output port, the hold
time is zero. In the example, it is e_hold.
slack: the timing slack . In the example,
setup time slack = clock_period + e_clk – s_clk – delay – e_setup = -0.2ns (there is 0.2ns
timing violation)
hold time slack = s_clk - e_clk + delay – e_hold = 9.8ns (there is no timing violation)
6. Leakage Power Calculation
The design’s leakage power is the summation of each cell instance’s leakage power.
 Leakage _ of _ cell
Leakage _ of _ Design 
all _ cells
7. Language/Platform


Language: C or C++
Platform: Solaris or Linux
8. Evaluation



The total negative slack of setup/hold time violations
The reduced leakage power
Run time
 Memory usage
The memory usage cannot exceed 1 Giga Bytes.
The total negative slack is the summation of all negative setup and hold time slacks. The
design’s leakage power should be as less as possible. The timing will be compared with
the report from Synopsys PrimeTime, in which the output design (design_opt.netlist), the
net load(net.cap) and the timing constraint(timing.con) will be read into PrimaryTime to
analyze the timing to see if the timing meets the requirement.
The formula to evaluate the result is as follows:
P = (leakage_power_of_original_design - leakage_power_of_optimized_design)
S = 0, if there is no timing violations.
S = total_negative_slack, if there is timing violations.
R = CPU run time.
For a test case, the P, S, and R will be normalized to calculate the score.
P will be normalized to get P’, 0<=P’<=100.
S will be normalized to get S’, 0<=S’<=100.
R will be normalized to get R’, 0<=R’<=30. But, basically we will terminate the test if the
program runs over 12 hours on the AMD Opteron 2.2 GHz equivalent machine.
8
Score = P’ – S’ – R’
Total score = summary score of all test cases.
We use P’ to explain the normalization.
P  Pmin
P' 
 100
Pmax  Pmin
Here Pmax is the most saved leakage among all teams, Pmin is the lest saved leakage among
all teams.
Assume there are five teams A,B,C,D,E, and F. Their saved leakage is
A:1000,
B:800,
C:600, D:400,
Then P’ of team B is
P' 
E:200,
F:0.
800  0
 100  80
1000  0
Below are some examples of the score calculation: Score = P’ – S’ – R’.
For a given test case, assume the result saves most leakage, no timing violation, and has
the fastest run time, then
score = 100 – 0 – 0 = 100.
If the result saves most leakage, no timing violation, but has the slowest run time, then
Score = 100 – 0 – 30 = 70.
If the result saves least leakage, no timing violation, and has the fastest run time, then
Score = 0 – 0 – 0 = 0.
If the result saves least leakage, has largest timing violation, and has the slowest run time,
then the score = 0 – 100 –30 = -130.
9
Download