Balanced Clock for Mismatch Resolution

advertisement
Gated Clocks in RT-Synthesis and Simulation
HDL&VUFE: High Level and Logic Synthesis – Developing of Standards
Wolfgang Ecker – Andre Windisch – Jochen Mades – Thomas Schneider – Ke Yang
Contact: Wolfgang.Ecker@infineon.com, Phone +49 89 234 45334, Fax +49 89 234 44950
Mail: Infineon Technologies – CPD DAT ADM, Otto-Hahn-Ring 6, 81730 München
Introduction
Due to the low-power requirements in a
increasing number of products, such as mobile
phones, laptops, or personal organizers, to name
only some, more and more designs use clock
gating to turn off complete parts of a chip and to
reduce power consumption in this way. Gated
clocks are also used in not-low power design, to
guarantee the maximum allowed power
dissipation of a chip.
Alternative design methods for low power
concentrate on asychronous design styles and
synthesis of asynchronous circuits.
Currently, clock gating in synthesis is done by
explicitly modeling the gates controlling the
clock. This causes semantic differences between
the VHDL RT-Model and the synthesized netlist
as shown next after some introductory words
about synchronous design in VHDL. The
following sections shortly present work-arounds
by clock balancing or by delaying signal
assignments. Afterwards, alternatives based on
extended wait statements and guard signals are
shown. Finally a proposal for a VHDL
extension, the allowing of general expressions
in port maps, which we call port expressions, is
discussed. The semantic of this port expressions
shall be based on implicit singnals.
Synchronous RT-Designs in VHDL
From the modeling point of view, RT-VHDL
relies on the idea that the time semantic is only
described in terms of clock. Many additional
restrictions, which would go far beyond the
intention of this section have to be considered.
Details can be found in the manual of VHDL
RT-synthesis tools or documents relating to the
VHDL synthesis standard subset.
Subsequent example shows a VHDL-code,
which is interpreted by synthesis tools as a
single bit flipflop without reset.
entity dff is
port( clk : in bit; d : in bit; q : out bit );
end dff;
architecture simple of dff is
begin
proces
begin
wait until clk = ‘1’;
q <= d;
end process;
end simple;
Combining three of the flipflops to a shiftregister, which is shown in the VHDL-code
below, gives a good example to illustrate the
hardware-relation of the VHDL standard
simulation cycle.
entity shift_register is
port( clk : in bit; d : in bit; q : out bit );
end shift_register;
architecture structural of shift_register is
signal t1, t2: bit;
begin
ff1: entity WORK.dff(simple) port map(clk=>clk;d=>d;q=>t1);
ff2: entity WORK.dff(simple) port map(clk=>clk;d=>t1;q=>t2);
ff3: entity WORK.dff(simple) port map(clk=>clk;d=>t2;q=>q);
end structural;
If VHDL had not the separation of signal
computation and signal update, i.e. if the
assignment is performed immediately, then the
result of the code above would be not
determined. More important, if the computation
sequence is ff3, ff2 and then ff1, then the
simulation result matches with the according
hardware consisting of three serial connected
flipflops. In contrast, if the computation
sequence is reverse, i.e. ff1, ff2 and then ff3,
then the simulation semantic is the same as of
one single flipflip, i.e. the simulation semantic
does not match with the semantic of the
synthesized. hardware.
Fortunately, VHDL possesses its sometimes unliked and misunderstood simulation semantic,
which ensures a short delay of the signal
assignment, which is often called “delta-delay”.
This semantic causes in the example, that first
all new flipflop output values are computed
dependent on the input values and one
simulation cycle later, all output values are
assigned. This relates somehow to the real
propagation delay through a flipflop. It is the
basis for synchronous, technology independent,
modeling.
Simulation and Synthesis Mismatch for
Gated Clocks
The advantage of the VHDL signal semantic for
clock related descriptions, especially RTdescriptions, works only in a correct way if the
clock changes in the first simulation cycle (also
called first delta cycle) of a simulation time unit
and all other signals (especially the input of the
flopflops) change either at another simulation
time (which may occur, if they are driven by
flipflops of another clock domain) or at
simulation cycle one of each simulation time
earliest. It also works fine if all clocks change at
the same delta-cycle and the combinational
signals hereafter.
Gating a clock in a structural way and modeling
this by zero delay gates, as usual in RTdescriptions, cause a clock signals to change at
different delta cycles, i.e. one clock edge occurs
one or more clock cycles after another clock
edge. This implies, that the data-input of a
flipflop might change before, or at the same
time of the clock event. This would cause a
feed-through simulation semantic of a wire
which contradicts with the synthesis semantic
of a flipflop.
Gating only on flipflop of the shift-register, as
shown below, is a good example to illustration
of this scenario:
entity shift_register_with_gate is
port( clk : in bit;
d : in bit;
ff2_en : in bit;
q : out bit );
end shift_register_with_gate;
architecture structural of shift_register_with_gate is
signal t1, t2: bit;
signal ff2_en_clk : bit;
begin
clock_gate: entity WORK.and2( clk, ff2_en, ff2_en_clk );
ff1: entity WORK.dff(simple) port map(clk=>clk;d=>d;q=>t1);
ff2: entity WORK.dff(simple) port map(clk=>ff2_en_clk;d=>t1;q=>t2);
ff3: entity WORK.dff(simple) port map(clk=>clk;d=>t2;q=>q);
end structural;
Here, the gated clock for ff2 if derived from clk
using an and-gate, which is assumed to be a
zero-delay gate, i.e. the output may change one
delta after the input-change only. This means,
that the clk-signal at ff2-flipflop changes one
delta after the clk-signal at the ff1-and ff3flipflops. This implies, that t2 the output of ff2
changes 2 deltas after the clock clk, which is
sufficient late, to guarantee a correct simulation
of ff3. The occurrence of a potential edge at
ff2_en_clk only one delta after an edge at clk
causes t1, the output of ff1 and the input of ff2
to change at the same cycle as ff2_en_clk. This
is late enough to take the new value at t1 as
input to ff2 immediately an not one cycle
delayed. In simulation, the existence of ff1 is
superfluous. A simulation/synthesis mismatch
occurs.
Balanced Clock for Mismatch Resolution
One simple solution for this simulation/
synthesis mismatch is the insertion of delay
elements for all clock signals such that all
clocks have their edges at the same delta cycle.
The modification of the example is shown
below.
architecture balanced_clock of shift_register_with_gate is
signal t1, t2: bit;
signal ff2_en_clk : bit;
signal ff1_3_del_clk : bit;
begin
clock_gate: entity WORK.and2( clk, ff2_en, ff2_en_clk );
clock_delay: entity WORK.buf( clk, ff1_3_del_clk);
ff1:entity WORK.dff(simple) port map(clk=>ff1_3_del_clk;d=>d;q=>t1);
ff2:entity WORK.dff(simple) port map(clk=>ff2_en_clk;d=>t1;q=>t2);
ff3:entity WORK.dff(simple) port map(clk=>ff1_3_del_clk;d=>t2;q=>q);
end balanced_clock;
It is obvious that a hughe effort has to be spent
to balance all clocks and to enshure that all
clocks occur at the same delta cycle. Also it has
be ensured, that the instance of the buffer buf
does not force the synthesis tool to infer a buffer
in the clock tree.
Delayed
Resolution
Assignments
for
Mismatch
Another solution is the delay of the output of the
flipflops for a specific amount of simulation
time.
Subsequent code shows the modified flipflop
primitive, which has a default offset, but which
also allows to set a specific one.
entity dff_with_offset is
generic( t_offset : time := 1 ns);
port( clk : in bit; d : in bit; q : out bit );
end dff_with_offset;
architecture simple of dff_with_offset is
begin
proces
begin
wait until clk = ‘1’;
q <= d after t_offset;
end process;
end simple;
Using this flipflops in conjunction with gated
clocks ensures, that all outputs of flipflops
change at another simulation time as the clock
does. No delta race can occur in this case.
This solution however has several drawbacks,
which make it hard to use. One is, that
propagation delay is not supported by synthesis,
however it is mostly ignored. More important is,
that it must be assured, that the derived clock
edge is not delayed by t_offset. This can
partially be reached by using flipflops without
offset to drive the clock gate logic, however
this solution does not work if a flipflop
controlling a derived clock has also a derived
clock.
Using wait-statements for Clock Gating
To achieve a clean solution for clock control,
clock gating must be possible without the loss
of one delta cycle, i.e. a logic function must be
executed with the clock signal without delay.
One possibility is to include the enable
mechanism in the wait statement as shown in
the flipflop model below.
entity dff_with_enable is
port( clk : in bit; d : in bit; en : in bit; q : out bit );
end dff_with_enable;
architecture simple of dff_with_enable is
begin
proces
begin
wait on clk until clk = ‘1’ and en = ‘1’;
q <= d;
end process;
end simple;
The shift register can then be modified to:
architecture with_enable_flipflop of shift_register_with_gate is
signal t1, t2: bit;
begin
ff1:entity WORK.dff(simple) port map(clk=>clk;d=>d;q=>t1);
ff2:entity WORK.dff_with_enable(simple)
port map(clk=>clk;d=>t1;en=>ff2_en,q=>t2);
ff3:entity WORK.dff(simple) port map(clk=>clk;d=>t2;q=>q);
end structural;
This model simulates the clock gating well.
From modeling standpoint, overhead must be
spent to “gate” each wait statement. Gating a
complete block requires the modification of
each wait statement and the port lists. This is
especially for reusable blocks (or IP) not
acceptable. It must be notet also, that the used
format of the wait statement is currently not
synthesizable.
Using Guarded Blocks for Clock Gating
Another possibility for clock gating is the use of
the guard expression of a block for performing
the required logical operation. To be able to use
the code without modification, the guard must
be converted (in our case to bit), to connect it to
a clock signal.
Due to the fact that the guard signal is an
implicit signal, this form of clock gating does
also not need an additional delta cycle.
The solution works fine, and allows for clock
gating for complete, probably reused blocks.
Unfortunately, this solution is currently not
synthesizable.
architecture with_guard of shift_register_gate is
signal t1, t2: bit;
begin
ff1:entity WORK.dff(simple) port map(clk=>clk;d=>d;q=>t1);
clock_gate: block( clk and ff2_en ) begin
ff2:entity WORK.dff_with_enable(simple)
port map(clk=>to_bit(guard) ; d=>t1 ; q=>t2 );
end clock_gate;
ff3:entity WORK.dff(simple) port map(clk=>clk;d=>t2;q=>q);
end with_guard;
Extending Type Conversion Functions in
Port Maps
The example above still contains some overhead
for clock gating, namely the block-statement
and the type conversion function. For that
reason, we propose to allow for more flexible
expressions in port maps as shown in the final
example. The expression is then resonsible for
creating an implicit signal in VHDL, which is
evaluated and executed before all explicit
signals are updated.
architecture with_port_expression of shift_register_with_gate is
signal t1, t2: bit;
begin
ff1:entity WORK.dff(simple) port map(clk=>clk;d=>d;q=>t1);
ff2:entity WORK.dff_with_enable(simple)
port map(clk=> (clk AND ff2_en) ; d=>t1 ; q=>t2 );
end clock_gate;
ff3:entity WORK.dff(simple) port map(clk=>clk;d=>t2;q=>q);
end structural;
Summary
We showed discrepancies between synthesis
and simulation, when gated clocks need be
modeled. We proposed and discussed several
alternatives for solving that problem, which
need some extension of the currently supported
synthesizable VHDL subset. Finally, we
proposed port expressions, a VHDL extension,
which helps to efficiently model clock gating of
complete blocks in a consistent way between
simulation and synthesis.
Thoughts, Comments, and Suggestions
A similar problem ariese when derived clocks
with multiple, partial frequency of phase shift
need to be generated.
Download