Uploaded by kenneth_hhlin

DFT clk mux DS

advertisement
DFT_clk_mux and DFT_clk_chain
Data Sheet
Revision 1.3
July 25, 2016
ABSTRACT
DFT Compiler adds DFT_clk_mux and DFT_clk_chain components to the netlist when
insert_dft is run with the set_dft_configuration -clock_controller enable
setting. These components are not documented in the DFT Compiler User Guide. This data
sheet is intended to document the architecture and operation of these components, and to
provide a check list for users concerned about the components’ impact on their design.
This document describes the implementation of the asynchronous OCC controller
instantiated by H-2013.03 and later releases. The differences in the synchronous OCC
controller, which became available in the I-2013.12 release, are described in an added
section.
Note: The PLL controller that is included with DFT Compiler is an example that is not
guaranteed to be appropriate for use in your design. If you decide to use this design, you
are responsible for validating that this functionality works in the context of your design.
SYNOPSYS CONFIDENTIAL
DFT_clk_mux
1 System Overview
The DFT_clk_mux and DFT_clk_chain are inserted as two separate modules in the top level of the
design, but they always function together as a unit. The DFT_clk_mux is inserted between the onchip clocking (OCC) clock generator, usually a phase-locked loop (PLL), and its clock tree to
provide control over the clock for scan shifting and capture. The DFT_clk_chain contains data to
control the capture operation of the DFT_clk_mux. These blocks are kept separate because the flipflops inside DFT_clk_mux must be nonscan to allow them to switch clock sources correctly, but the
flip-flops inside DFT_clk_chain must be on the scan chains so that the capture pulses can be
controlled by ATPG.
The purpose of these blocks is to allow ATPG to specify capture sequences consisting of a fixed
number of pulses from a PLL which may be running asynchronously to the primary inputs
controlled by the ATE. The scan shift operation takes place under direct ATE control, and switching
between the different clock sources is done glitchlessly. The fast sequential ATPG engine in
TetraMAX specifies capture sequences with a maximum of 10 cycles, so it is not meaningful to
create DFT_clk_mux blocks capable of emitting more pulses, although it is legal and the IP block
works in this case.
1.1 Schematics
These schematics correspond to the connections made automatically by the insert_dft command
for a specification with two PLL clocks and a maximum of two clock pulses per capture cycle. If
more clock pulses are selected, the DFT_clk_chain becomes longer, and the counter and decoder
become larger. Note that the logic is shown generically, and might appear different after synthesis.
In Figure 1, the DFT_clk_mux is shown as it would be instantiated in the design. Before the
insert_dft command is run, the PLL is connected to the clock drivers, and the clock trees and scan
flip-flops must already exist. These are not changed by insert_dft (besides adding the scan-enable
and serial scan connections) but the DFT_clk_mux is inserted at the output of the PLL, with
DFT_clk_chain controlling it.
The circuitry inside DFT_clk_mux is shown in different figures for clarity. The hierarchy inside it is
preserved during insert_dft, but in a different grouping. The hierarchy inside DFT_clk_chain is
flattened during insert_dft.
SYNOPSYS CONFIDENTIAL
2
DFT_clk_mux
DFT_clk_chain
DQ
SI
SE
test_siN
[0]
DQ
SI
SE
DQ
SI
SE
[1]
DQ
SI
SE
[2]
test_soN
[3]
Clock
Drivers
Clock
Trees
Scan
Flops
DQ
SI
SE
DFT_clk_mux
PLL
CLKA
[3:2]
Fast Pulse
Controller
Clock
Selection
Circuit
Fast Pulse
Controller
Clock
Selection
Circuit
CLKB
[1:0]
DQ
SI
SE
ATECLK
pll_reset
pll_bypass
test_mode
test_se
Figure 1.
DFT_clk_mux & DFT_clk_chain in the design.
The contents of the dashed boxes are shown in the following figures.
clk_enable[1]
clk_enable[0]
U_clk_control_i_0/
load_n_meta_0_l_reg
U_clk_control_i_0/
load_n_meta_1_l_reg
DQ
DQ
U_clk_control_i_0/
load_n_meta_2_l_reg
DQ
load_n (load 0)
Q[1:0]
Counter: 0-to-3
(then hold)
slow_clk_enable
(from Clock
Selection Circuit)
rst_n
Decoder:
2-to-4
[3]
[2]
[1]
[0]
pipeline_or_tree
(to Clock
Selection Circuit)
DQ
pll_reset
fast_clk
Figure 2.
Contents of the “Fast Pulse Controller” block from Figure 1.
The instance names of the clock domain crossing synchronization flip-flops are for the first DFT_clk_mux
to be inserted. For subsequent ones, increment the first 0.
SYNOPSYS CONFIDENTIAL
3
DFT_clk_mux
test_se
slow_clk_enable
DQ
1'b0
slow_clk
pipeline_or_tree
clk
DQ
fast_clk
pll_reset
pll_bypass
test_mode
Figure 3.
Contents of the “Clock Selection Circuit” block from Figure 1, using the default (false) of
test_occ_insert_clock_gating_cells. Clock paths are shown in red.
test_se
slow_clk_enable
DQ
D Q
GN
slow_clk
pipeline_or_tree
clk
DQ
D Q
GN
fast_clk
pll_reset
pll_bypass
test_mode
Contents of the “Clock Selection Circuit” block from Figure 1, when
test_occ_insert_clock_gating_cells is set to true.
The inner dashed boxes show logic that can be replaced by integrated clock gating cells by using the
test_icg_p_ref_for_dft variable. Clock paths are shown in red.
Figure 4.
1.2 Schematics for Synchronized OCC Controller
The synchronized OCC controller uses the same components as the OCC controller with
test_occ_insert_clock_gating_cells set to true, with the addition of a synchronized
triggering block that ensures that the counter for each PLL’s clock is triggered at the right time to
make the cycle 0 pulse rising edges simultaneous. The single synchronized triggering block replaces
all of the synchronization flip-flops that are used in the fast pulse controller block of the
asynchronous OCC controller.
SYNOPSYS CONFIDENTIAL
4
DFT_clk_mux
U_SYNC_occ/meta_4x_0_0_reg
U_SYNC_occ/meta_4x_1_0_reg
U_SYNC_occ/meta_4x_neg_pre_0_reg
clk4x_trigger
scan_en
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
clk4x
clk2x_trigger
Metastability
registers
clk2x
clk1x_trigger
DQ
clk1x
clk1x_number2_trigger
DQ
clk1x_number2
reset
(to all asynchronous resets)
SYNCHRONIZED
TRIGGERING BLOCK
Figure 5.
Synchronization block for OCC controller with one -4x_clocks argument,
one -2x_clocks argument, and two -1x_clocks arguments.
The figure shows a representative configuration of the synchronized triggering block. Falling edges
are used between the synchronized clock domains, because the OCC controller is placed on the
input side of the clock distribution tree, at which point the clocks may not be skew balanced so
perfectly that shift registers would work shifting from one clock domain to another.
The fastest clock is used for the metastability registers. If multiple 4X or 2X clocks are used, each
one gets its own shift register for delay balancing. Multiple 1X clocks are handled by using the
falling edge to reduce the number of flip-flops required.
SYNOPSYS CONFIDENTIAL
5
DFT_clk_mux
2 DFT_clk_mux
2.1 Naming Convention
The module is instantiated under this name:
<string>_DFT_clk_mux_<number>
where
<string> is the current_design during the insert_dft run
<number> is the uniquification number of the controller, starting from 0
2.2 Ports
Port Name
reset
test_mode
pll_bypass
scan_en
clk_enable[m:0]
fast_clk[n:0]
slow_clk
clk[n:0]
Direction
Input
Input
Input
Input
Input
Input
Input
Output
Function
1 to reset controller, 0 to allow controller to operate
1 to control clock, 0 to select fast_clk unconditionally
1 to select slow_clk, 0 to allow clock switch-over operations
Mediates clock switch-over operation
Capture pulse control from clock chain
Fast clock from PLL
ATE clock
Output clock to scan flip-flops
Table 1. DFT_clk_mux I/O ports
The widths of the buses are determined by options of the set_dft_clk_controller command:


clk and fast_clk are as wide as the number of elements in the -pllclocks list
clk_enable is as wide as the number of elements in the -pllclocks list times the argument
of the -cycles_per_clock option.
When the bus width would be 1, a scalar port of the same name is used instead.
SYNOPSYS CONFIDENTIAL
6
DFT_clk_mux
2.3 Connections
As instantiated by insert_dft, the DFT_clk_mux ports are connected as follows:
Port Name
reset
test_mode
pll_bypass
scan_en
clk_enable
fast_clk
Type
Primary Input
Primary Input
Primary Input
Primary Input
Internal
Internal
slow_clk
Primary Input
Default Name
pll_reset
test_mode
pll_bypass
test_se
DFT_clk_chain(clk_ctrl_data)
-pllclocks hookup pin
(last element in list is bit 0)
-ateclocks argument
clk
Internal
-pllclocks destination
CTL DataType
snps_pll_reset
TestMode
snps_pll_bypass
ScanEnable
MasterClock
ScanMasterClock
-
(last element in list is bit 0)
Table 2. DFT_clk_mux default connections
2.4 Functional Operation
The functional operation of DFT_clk_mux is to select either the fast_clk input or the slow_clk input
to pass to the clk output.
Three of the inputs are static controls to the output multiplexer. Switching any of these inputs takes
effect immediately and can result in glitches on the clk output. These signals are listed in Table 3.
test_mode pll_bypass source of clk output
0
fast_clk
1
1
slow_clk
1
0
dynamic selection
Table 3. Static control states in DFT_clk_mux
The remaining inputs control the dynamic selection of the two clocks. When used properly, they
ensure that switching between the clocks is done glitchlessly. A clock is deselected on its own
falling edge, then the clk output is held low until the new clock selection is made on its own falling
edge to ensure glitchless operation and full pulse widths.
reset is only used for initialization. In the test protocol, it pulses to 1 and then stays at 0 for the
remainder of the test. When reset goes back to 0, the sequence of operations is:


If scan_en is 1, one slow_clk pulse is required and then slow_clk is selected.
If scan_en is 0, the next fast_clk pulse starts a capture pulse sequence.
Pulsing reset to 1 after initialization is improper use, and will result in the clk output immediately
going to 0.
SYNOPSYS CONFIDENTIAL
7
DFT_clk_mux
DFT_clk_mux can reset itself even without the reset pulse. By setting scan_en to 1 and waiting for
one fast_clk pulse followed by one slow_clk pulse (which selects the slow_clk input) and after five
more fast_clk pulses it will be ready to go through a capture sequence.
clk_enable is a bus connected to the clk_ctrl_data output of a DFT_clk_chain block. This bus is
loaded during the scan shift operation. Changing this input while scan_en is low is improper use and
can result in unpredictable glitching on the clk output. Each bit selects a pulse on an output clk
signal at a particular clock cycle count of its corresponding fast_clk input. A value of 1 represents a
pulse and a value of 0 represents no pulse. The grouping is first by output clock and second by
count.
For example, if set_dft_clk_controller has three elements in its -pllclocks list and a
-cycles_per_clock argument of 2:
clk_enable[0] selects count 1 on clk[0]
clk_enable[1] selects count 2 on clk[0]
clk_enable[2] selects count 1 on clk[1]
clk_enable[3] selects count 2 on clk[1]
clk_enable[4] selects count 1 on clk[2]
clk_enable[5] selects count 2 on clk[2]
scan_en is connected to the scan enable signal used by the internal scan chains. It works as follows:

When scan_en goes high, slow_clk is selected following its first falling edge. Every
transition on slow_clk is passed through to the clk output.

When scan_en goes low, the signal is resynchronized from the slow clock domain (captured
by a single flip-flop in the clock selection block) to the fast clock domain (resynchronized by
three successive synchronizer flip-flops in the fast pulse controller block). Once the low scan
enable signal has been resynchronized, a counting sequence from 0 to N+1 is initiated by the
fast pulse controller, according to the -cycles_per_clock N argument. Cycles 0 and N+1
are quiet, while cycles 1 through N selectively issue fast clock pulses depending on the
values loaded into the clock chain.
If the OCC controller is used with a pipelined scan-enable signal, additional steps are needed to
ensure correct operation. For more information, see “On-Chip Clocking Support” in the DFT
Compiler User Guide.
Figures 5 and 6 show the behaviors in a case with set_dft_clk_controller
-cycles_per_clock 2:
SYNOPSYS CONFIDENTIAL
8
DFT_clk_mux
3 synchronization
cycles
Count = 0 (no pulse)
Count = 1 (pulse next cycle if enabled)
Count = 2 (pulse next cycle if enabled)
Count = 3 (terminal)
fast_clk
slow_clk
scan_en
clk
scan_en falling deselects
slow_clk asynchronously
Figure 6.
scan_en rising takes effect
on next falling clock edges
Capture cycle example using the default (false) of test_occ_insert_clock_gating_cells.
Count = 0 (no pulse)
Count = 1 (pulse 2nd following cycle if enabled)
Count = 2 (pulse 2nd following cycle if enabled)
Count = 3 (terminal)
3 synchronization
cycles
fast_clk
slow_clk
scan_en
clk
scan_en falling deselects
slow_clk on next rising edge
scan_en rising takes effect
on next rising clock edges
Figure 7. Capture cycle example when test_occ_insert_clock_gating_cells is set to true.
The dotted arrows show data setup relationships to their corresponding clock edges. scan_en must
be synchronized to slow_clk and it must change while slow_clk is low to avoid truncating its pulse
on clk. No synchronization with fast_clk is assumed and clock domain crossing synchronization
logic is provided. Minimum widths are required for both the high and low pulses of scan_en:

The scan_en low pulse must encompass a slow_clk pulse followed by a number of fast_clk
pulses equal to the -cycles_per_clock argument plus five (three synchronization cycles
plus two extra counter cycles). Failure to meet this requirement will cause a failure during
pattern simulation. Capture pulses will be skipped, but no glitching will occur and the
following scan operation will work correctly.
SYNOPSYS CONFIDENTIAL
9
DFT_clk_mux
If needed, increase the duration of the scan_en low pulse by using the set_atpg
-min_ateclock_cycles cycles command in TetraMAX to specify the number of slow
clock cycles that the signal is held low. You can calculate this number using the waveform
diagrams, the period of the slow clock, and the largest period across all fast clocks.
If the clock pulses have considerable propagation delay to the scan flip-flops, you can also
use the -min_ateclock_cycles option to add additional delay to the low scan_en pulse so
that that the clock pulses reach their destination before the rising scan enable transition.
There is no maximum scan_en low pulse width.

The scan_en high pulse must encompass a slow_clk pulse followed by five fast_clk pulses.
Failure to meet this requirement may cause all capture pulses in the next following capture
cycle to be skipped. There is no maximum scan_en high pulse width.
2.5 Special Considerations
The DFT_clk_mux component is added into the design by the insert_dft command when
set_dft_configuration -clock_controller enable is set. Here are the special considerations
that you should be aware of in order to use it successfully.
2.5.1 Synthesis and Optimization
When the insert_dft command maps DFT_clk_mux to gates, it does not optimize it for insertion
delay, drive strength or differential delay (pulse shaping). The timing is invalidated by insert_dft,
so afterwards update_timing must be run before report_timing. If a timing problem is found, run
the compile -incremental command (which you can run in any case to ensure the best
optimization).
It is also possible to completely re-map the logic in DFT_clk_mux using a non-incremental compile.
In this case, run the characterize command on the DFT_clk_mux instance, change the
current_design to the DFT_clk_mux design, then run a full compile command. Do not use the
compile -scan command since the clock controller must not be put onto the scan chains.
The hierarchy of DFT_clk_mux may be flattened if the test_occ_insert_clock_gating_cells
variable was set to true for OCC insertion. If this variable was false, which is the default, the
hierarchy flattening introduces clock reconvergence when pll_bypass is asserted. This clock
reconvergence is very difficult for static timing analysis to analyze successfully, so hierarchy
flattening is not recommended in this case.
After re-synthesis, the STIL Procedure File (SPF) may need to be edited to track the new instance
names. The names that may need to change are those labeled Internal in the block named
ClockStructures.
2.5.2 Metastability-Hardened Flip-Flops
Some of the flip-flops inside DFT_clk_mux are used for signaling from the slow clock domain to
the fast clock domain(s). These flip-flops should be replaced with metastability-hardened flip-flops
SYNOPSYS CONFIDENTIAL
10
DFT_clk_mux
if these are available in the standard-cell library. For asynchronous OCC controllers, the instances
that should be replaced are those shown in Figure 2, and the instance names are:
U_clk_control_i_*/load_n_meta_{0,1,2}_l_reg
where * starts at 0 and increments as needed to cover the number of clocks controlled by the specific
DFT_clk_mux.
For synchronized OCC controllers, the instances that should be replaced are those shown in Figure
5, and the instance names are:
U_SYNC_occ/meta_*_reg
2.5.3 Verilog Simulation
These same metastability flip-flops may cause unnecessary failures in full-timing gate-level
simulation. The timing checks of the U_clk_control_i_*/load_n_meta_0_l_reg instances should be
disabled to prevent this. (For synchronized OCC controllers, the instance name is
U_SYNC_occ/meta_*x_0_0_reg.) Only the first of the metastability flip-flops in each
DFT_clk_mux instantiation needs to have its timing disabled. In VCS, this can be done by using the
noTiming configuration file attribute. See the VCS User Guide for details.
2.5.4 Static Timing Analysis
Static timing analysis requires a special setup to make the required clock gating checks. This setup is
described in SolvNet article 022490, “Static Timing Analysis Constraints for On-Chip Clocking
Support.”
2.5.5 Clock-Tree Synthesis
Clock-tree synthesis (CTS) can cause timing problems if it is not set up properly. If CTS is allowed
to balance the clock skew to the flip-flops inside DFT_clk_mux to the same value as the flip-flops
on the endpoints of the clock tree, then the clock output of DFT_clk_mux may include glitches or
shortened clock pulses. This is because DFT_clk_mux gates the clock before it has gone through the
clock tree’s delay. The solution to this is to skew the clock to all fanouts inside DFT_clk_mux to be
earlier than that going to other destinations of the same clock. In IC Compiler, this can be done
using the set_clock_tree_exceptions -float_pins command. See the IC Compiler
documentation for details.
Note that the clock for DFT_clk_chain should use a clock balanced to the functional flip-flops on
endpoints of the clock tree. Its flip-flops are on the scan chains with the functional flip-flops, and its
outputs to DFT_clk_mux are ignored during shift but stable during the capture cycle, so they do not
have to meet single-cycle timing on those paths.
2.5.6 Verification of Functional Operation
For functional operation, the DFT_clk_mux must behave as a buffer from the fast_clk input to the
clk output. The default DFT_clk_mux behaves this way when the test_mode input is held to 0. The
SYNOPSYS CONFIDENTIAL
11
DFT_clk_mux
latch-based DFT_clk_mux, inserted when test_insert_clock_gating_cells is true, also
requires slow_clock to be 0 so its clock-gating latch is initialized. You can either hold slow_clock at
0, or you can use an initialization sequence instead.
Functional operation should be verified either statically through formal verification, or dynamically
through gate-level simulation of the initialization sequence, or both. When running formal
verification, be sure to put all of the relevant constraints on the signals, especially the slow_clk
signal if you are using the latch-based DFT_clk_mux.
SYNOPSYS CONFIDENTIAL
12
DFT_clk_mux
3 DFT_clk_chain
This section describes the use of the DFT_clk_chain block with regular scan and scan compression.
3.1 Naming Convention
The module is instantiated under this name:
<string>_DFT_clk_chain_<number>
where
<string> is the current_design during the insert_dft run
<number> is the uniquification number of the controller, starting from 0
3.2 Ports
Port Name
clk
se
si[n:0]
so[n:0]
clk_ctrl_data[m:0]
Direction
Input
Input
Input
Output
Output
Function
Falling edge clock
1 to shift scan chains, 0 to hold previous data
Scan inputs
Scan outputs
Parallel output data
Table 4. DFT_clk_chain I/O ports
The widths of the buses are determined by the options of the set_dft_clk_controller command:


si and so are as wide as the argument of -chain_count
clk_ctrl_data is as wide as the number of elements in the -pllclocks list times the argument
of the -cycles_per_clock option
When the bus width would be 1, a scalar port of the same name is used instead.
SYNOPSYS CONFIDENTIAL
13
DFT_clk_mux
3.3 Connections
As instantiated by the insert_dft command, the DFT_clk_chain ports are connected as follows:
Port Name
clk
se
si
so
clk_ctrl_data
Type
Internal
Primary Input
Primary Input
Primary Output
Internal
Default Name
DFT_clk_mux(clk[max])
test_se
test_si
test_so
DFT_clk_mux(clk_enable)
CTL DataType
ScanEnable
ScanDataIn
ScanDataOut
-
Table 5. DFT_clk_chain default connections
3.4 Functional Operation
DFT_clk_chain shifts data, on the falling edge of clk, from the si inputs to the so outputs when se is
high. When se is low, the previous data is held. The data is also read on the clk_ctrl_data parallel
output bus. When data is scanned, the first bit from si[0] feeds the flip-flop driving clk_ctrl_data[0].
3.5 Special Considerations
The addition of the DFT_clk_chain by the insert_dft command may cause difficulties if the clock
tree has already been balanced. Make sure that scan shifting works properly with full timing,
especially at the boundaries of the DFT_clk_chain. This can be done in PrimeTime using the script
written out by the TetraMAX tmax2pt.tcl utility command write_timing_constraints -mode
shift. The most likely timing problems are with hold time (-delay_type min).
The clock skew problem can be avoided by moving the DFT_clock_chain clock connection to a
different clock. This can be done automatically with the test_dedicated_clock_chain_clock
variable (see the DFT Compiler User Guide). Some users have manually moved the DFT_clk_chain
clock connection to the ATE clock (as defined by set_dft_clk_controller -ateclocks), but
this is not recommended because this clock is free-running so an extra shift pulse invalidates the
clock chain data on scan out. The ATE clock connection can only be made to work in TetraMAX by
using add_cell_constraint OX on every flip-flop in DFT_clk_chain.
DFT_clk_chain shifts on the falling clock edge. This allows it to be stitched at the beginning of the
scan chain, which is very helpful in scan compression mode. However, it may require a separate
scan chain of its own if the set_scan_configuration -clock_mixing no_mix or
mix_clocks_not_edges options have been applied. When inserting DFT_clk_chain at top level of
a design, it is better to use the set_scan_configuration -clock_mixing mix_edges or
mix_clocks options so that edge mixing is permitted.
SYNOPSYS CONFIDENTIAL
14
Download