Design for Testability Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University 101, Sec. 2, Kuang Fu Rd. Hsinchu, Taiwan R.O.C. 1. Ad Hoc DFT Guidelines There is definitely no single methodology which solves all VLSI testing problems; there also is no single DFT technique which is effective for all kinds of circuits. DFT techniques can largely be divided into two categories, i.e., ad hoc techniques and structured (systematic) techniques. The latter will be discussed in Sec. 3, which include major internal scan approaches; while the former are the subject of this section, which are basically ad hoc DFT guidelines. Some important guidelines are listed below. 1. Partition large circuits into smaller subcircuits to reduce test costs. One of the most important step in designing a testable chip is to first partition the chip in an appropriate way such that for each functional module there is an effective (DFT) technique to test it. Partitioning certainly has to be done at every level of the design process, from architecture to circuit, whether testing is considered or not. Conventionally, when testing is not considered, designers partition their objects to ease management, to speed up turn-around time, to increase performance, and to reduce costs. We stress here that the designers should also partition the objects to increase their testability. T1 T2 Mode 1 S 0M C2 C1 Normal Test C1 Test C2 T1 T2 0 0 1 0 1 0 S 1 M 0 0 1 M S 1 0 S M Figure 1: Circuit partitioning. Partitioning can be functional (according to functional module boundaries) or physical (based on circuit topology). In general, either way is good for testing in most cases. Partitioning can be done by using multiplexers (see Fig.1) and/or scan chains (see Sec. 3). Maintaining signal integrity is a basic guideline on partition for testability, which helps localize faults. After partitioning, modules should be completely testable via their interface. 2. Employ test points to enhance controllability & observability. Another important DFT technique is test point insertion. Test points include control points (CPs) and observation points (OPs). The former are active test points, while the latter are passive ones. There are test points which are both CPs and OPs. OP C1 . . . C2 . . . M CP1 C3 CP2 CP3 CP4 Figure 2: Test point insertion. After partitioning, we still need a mechanism of directly accessing the interface between the modules. Test stimuli and responses of the module under test can be made accessible through test points. Test point insertion can be done as illustrated in Fig. 2, and can be accessed via probe pads and extra or shared (multiplexed) input/output pins. Before exercising test through test points which are not PIs and POs, we should investigate into additional requirements on the test points raised by the use of test equipments. 3. Design circuits to be easily initializable. This increases predictability. A power-on reset mechanism is the most effective and widely used approach. A reset pin for all or some modules also is important. Synchronizing or homing sequences for small finite state machines may be used where appropriate (a famous example is the JTAG TAP controller--see the chapter on Boundary Scan). 4. Disable internal one-shots (monostables) during test. This is due to the difficulty for the tester to remain synchronized with the DUT. A monostable (one-shot) multivibrator produces a pulse of constant duration in response to the rising or falling transition of the trigger input. It has only one stable state. Its pulse duration is usually controlled externally by a resistor and a capacitor (with current technology, they also can be integrated on chip). One-shots are used mainly for 1) pulse shaping, 2) switch-on delays, 3) switch-off delays, 4) signal delays. Since it is not controlled by clocks, synchronization and precise duration control are very difficult, which in turn reduces testability by ATE. Counters and dividers are better candidates for delay control. 5. Disable internal oscillators and clocks during test. To guarantee tester synchronization, internal oscillator and clock generator circuitry should be isolated during the test of the functional circuitry. Of course the internal oscillators and clocks should also be tested separately. 6. Provide logic to break global feedback loops. Circuits with feedback loops are sequential ones. Effective sequential ATPGs are yet to be developed, while combinational ones are relatively mature now. Breaking the feedback loops turn the sequential testing problem into a combinational one, which greatly reduces the testing effort needed in general. Specifically, test generation becomes feasible, and fault localization becomes much easier. Breaking global feedback loops are especially effective, since otherwise we are facing the problem of testing a large sequential circuit (such as a CPU), which can frequently be shown to be very hard or even impossible. Scan techniques and/or multiplexers which are used to partition a circuit can also be used to break the feedback loops. The feedback path can be considered as both the CP and the OP. 7. Partition large counters into smaller ones. Sequential modules with long cycles such as large counters, dividers, serial comparators, and serial parity checkers require very long test sequences. For example, a 32-bit counter requires 232 clock cycles for a full state coverage, which means a test time of more than one hour only for the counter, if a 10 MHz clock is used. Test points should be used to partition these circuits into smaller ones and test them separately. 8. Avoid the use of redundant logic. This has been discussed in the chapters on combinational and sequential ATPG. 9. Keep analog and digital circuits physically apart. Mixed analog and digital circuits in a single chip is gaining attraction as VLSI technologies keep moving forward. Analog circuit testing, however, is very much different from digital circuit testing. In fact, what we mean by testing for analog circuits is really measurement, since analog signals are continuous (as opposed to discrete or logic signals in digital circuits). They require different test equipments and different test methodologies, therefore they should be tested separately. To avoid interference or noise penetration, designers know that they should physically isolate the analog circuit layout from the digital one which resides on the same chip, with signals communicated via AD converters and/or DA converters. For testing purpose, we require more. These communicating wires between the analog and digital modules should become the test points, i.e., we should be able to test the analog and digital parts independently. 10. Avoid the use of asynchronous logic. Asynchronous circuits are sequential ones which are not clocked. Timing is determined by gate and wire delays. They usually are less expensive and faster than their synchronous counterpart, so some experienced designers like to use them. Their design verification and testing, however, are much harder than synchronous circuits. Since no clocking is employed, timing is continuous instead of discrete, which makes tester synchronization virtually impossible, and therefore only functional test by application board can be used. In almost all cases, high fault coverage cannot be guaranteed within a reasonable test time. 11. Avoid diagnostic ambiguity groups such as wired-OR/wired-AND junctions and high-fanout nodes. Apart from performance reasons, wired-OR/wired-AND junctions and high-fanout nodes are hard to test (they are part of the reasons why ATPGs are so inefficient), so they should be avoided. 12. Consider tester requirements. Tester requirements such as pin limitation, tristating, timing resolution, speed, memory depth, driving capability, analog/mixed-signal support, internal/boundary scan support, etc., should be considered during the design process to avoid delay of the project and unnecessary investment on the equipments. The above guidelines are from experienced practitioners. They are not meant to be complete or universal. In fact, there are drawbacks for these techniques: high fault coverage cannot be guaranteed; manual test generation is still required; design iterations are likely to increase. 2. Syndrome-Testable Design Syndrome testing [10, 11] is an exhaustive method, and is only for combinational circuits, so it is not considered as an efficient method. The idea of introducing control inputs to make circuits syndrome testable (to be explained below) however can be applied effectively in many DFT situations other than syndrome testing. Definition 1 The syndrome of a boolean function f is S ( f ) k( f ) , where k is the number of 1s (minterms) in f 2n and n is the number of independent input variables. Exhaustive Patterns Patterns UUT Syndrome Register Comparator (Counter) Reference syndrome Figure 3: A typical syndrome testing set-up. Go/No-go By the definition, 0≦S(f) ≦1. A circuit is is said to be syndrome testable iff fault, , S(f)≠ S( fα). To use syndrome testing, the DUT must be syndrome testable. Furthermore, since the method is exhaustive (we have to evaluate all 2n input combinations), it is applicable only to circuits with small number of inputs, e.g., n 20. For large circuits, we can partition them with scan chains and/or multiplexers. A typical syndrome testing set-up is shown in Fig. 3. According to the definition, the syndromes of primitive gates can easily be calculated, as shown in Tab. 1. Gate ANDn S 1 2n ORn 1- 1 2n XORn NOT 1 2 1 2 Table 1: Syndromes of primitive gates. The overall syndrome of a fanout-free circuit can then be derived in a recursive manner. For example, consider a circuit having 2 blocks, f and g, with unshared inputs. Its overall syndrome can be obtained according to Tab. 2. O/p gate S OR s f Sg S f Sg AND S f Sg XOR S f S g 2S f S g NAND 1 S f Sg NOR 1 S f Sg S f Sg Table 2: Recursive calculation of syndromes. Example 1 Calculate the syndrome of the following circuit. 1 3 4 4 1 3 S2 1 4 4 1 S3 8 S 4 1 ( S S 3 S 2 S 3 ) 7 / 32 S1 1 S S1 S 4 21 / 128 Exercise 1 Show that for blocks with shared inputs (circuits having reconvergent fanouts): S ( f g ) S f S g S ( fg ) S ( fg ) S f S g S ( fg ) 1 S ( f g ) S ( fg ) S ( fg ) From the above discussion, syndrome is a property of function, not of implementation. Definition 2 A logic function is unate in a variable xi if it can be represented as an sop or pos expression in which the variable xi appears either only in an uncomplemented form or only in a complemented form Theorem 1 A 2-level irredundant circuit realizing a unate function in all its variables is syndrome-testable. Theorem 2 Any 2-level irredundant circuit can be made syndrome-testable by adding control inputs to the AND gates. Example 2 1 1 Let f xz yz . Then S . If z / 0, then f a y. S S . 2 2 Syndrome untestable . S1 S S2 S4 S3 Now add a control input c f cxz yz , where when in normal operation mode 1 c normal i/p when in test mode 3 1 S , f y , and S S . Syndrome testable. 8 2 Note that the modification process doubles the test set size. 3. Scan Design Approaches Although we have not formally presented the scan techniques, their purpose and importance have been discussed in the previous sections, namely, they are effective for circuit partitioning; they provide controllability and observability of internal state variables for testing; they turn the sequential test problem into a combinational one. There are four major scan approaches that we will discuss in this section, i.e., MUXed Scan [12] Scan path [13,14]; LSSD [15, 16]; Random access [17]. 3.1 MUXed Scan This approach is also called the MUX Scan Approach, in which a MUX is inserted in front of every FF to be placed in the scan chain. It was invented at Stanford in 1973 by M. Williams & Angell, and later adopted by IBM--heavily used in IBM products. X y Combinational Logic Z Y State Vector Figure 4: A finite state machine model for sequential circuits. A popular finite state machine (FSM) model for sequential circuits is shown in Fig. 4, in which X is the PI vector, Z the PO vector, Y the excitation (next state) vector, and y the present state vector. The excitation vector is also called the pseudo primary output (PPO) vector, and the present state vector is also called the pseudo primary input (PPI) vector. To make elements of the state vector controllable and observable, we add the following items to the original FSM (see Fig.5): C/L Z X SI M FF M FF M FF L1 D Q D SO C T DI L2 Q SI T C Figure 5: The Shift-Register Modification approach. a TEST mode pin (T); a SCAN-IN pin (SI); a SCAN-OUT pin (SO); a MUX (switch) in front of each FF (M) When the test mode pin T=0, the circuit is in normal operation mode; when T=1, it is in test mode (or shift-register mode). This is clearly shown in Fig.5. The test procedure using this method is as follows: Switch to the shift-register (SR) mode (T=1) and check the SR operation by shifting in an alternating sequence of 1s and 0s, e.g., 00110 (a simple functional test). Initialize the SR-load the first pattern from SI Return to the normal mode (T=0), apply the test pattern, and capture the response. Switch to the SR mode and shift out the final state from SO while setting the starting state for the next test. Go to if there is a test pattern to apply. This approach effectively turns the sequential testing problem into a combinational one, i.e., the DUT becomes the combinational logic which usually can be fully tested by compact ATPG patterns. Unfortunately, there are two types of overheads associated with this technique which the designers care about very much: the hardware overhead (including three extra pins, multiplexers for all FFs, and extra routing area) and performance overhead (including multiplexer delay and FF delay due to extra load). Since test mode and normal mode are exclusive of each other, in test mode the SI pin may be a redefined input pin, and the SO pin may be a redefined output pin. The redefinition of the pins can be done by a multiplexer controlled by T. This arrangement is good for a pin-limited design, i.e., one whose die size is entirely determined by the pad frame. The actual hardware overhead varies from circuit to circuit, depending on the percentage of area occupied by the FFs and the routing condition. 3.2 Scan Path This approach is also called the Clock Scan Approach, in which the multiplexing function is implemented by two separate ports controlled by two different clocks instead of a MUX. It was invented by Kobayashi et al. in 1968, and reported by Funatsu et al. in 1975, and adopted by NEC. It uses two-port raceless D-FFs: each FF consists of two latches operating in a master-slave fashion, and has two clocks (C1 and C2) to control the scan input (SI) and the normal data input (DI) separately. The logic diagram of the two-port raceless D-FF is shown in Fig.6 . The test procedure of the Clock Scan Approach is the same as the MUX Scan Approach. The difference is in the scan cell design and control. The MUX has disappeared from the scan cell, and the FF is redesigned to incorporate the multiplexing function into the register cell. The resulting two-port raceless D-FF is controlled in the following way: Normal mode: C2 = 1 to block SI; C1 = 0 →1 to load DI. SR (test) mode: C1 = 1 to block DI; C2 = 0 →1 to load SI. C2 SI DI DO SO C1 L2 L1 2-port raceless master-slave D FF Figure 6: Logic diagram of the two-port raceless D-FF. This approach is said to achieve a lower hardware overhead (due to dense layout) and less performance penalty (due to the removal of the MUX in front of the FF) compared to the MUX Scan Approach. The real figures however depend on the circuit style and technology selected, and on the physical implementation. 3.3 Level-Sensitive Scan Design (LSSD) This approach was introduced by Eichelberger and T. Williams in 1977 and 1978. It is a latch-based design used at IBM, which guarantees race- and hazard-free system operation as well as testing, i.e., it is insensitive to component timing variations such as rise time, fall time, and delay. It also is claimed to be faster and have a lower hardware complexity than SR modification. It uses two latches (one for normal operation and one for scan) and three clocks. Furthermore, to enjoy the luxury of race- and hazard-free system operation and test, the designer has to follow a set of complicated design rules (to be discussed later), which kill nine designers out of ten. Definition 3 A logic circuit is level sensitive (LS) iff the steady state response to any allowed input change is independent of the delays within the circuit. Also, the response is independent of the order in which the inputs change D C D L L +L C C D 0 0 0 1 L 1 0 0 1 1 1 +L L Figure 7: A polarity-hold latch. DI +L1 C DI C SI A SI +L2 +L1 L1 L2 +L2 B A B Figure 8: The polarity-hold shift-register latch (SRL). LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Fig. 7 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data propagation and stabilization time. Fig. 8 shows the polarity-hold shift-register latch (SRL) used in LSSD as the scan cell. The scan cell is controlled in the following way: Normal mode: A=B=0, C=0 1. SR (test) mode: C=0, AB=10 01 to shift SI through L1 and L2. The SRL has to be polarity-hold, hazard-free, and level-sensitive. To be race-free, clocks C and B as well as A and B must be nonoverlapping. This design (similar to Scan Path) avoids performance degradation introduced by the MUX in shift-register modification. If pin count is a concern, we can replace B with A C , i.e., NOR(A,C). Figure 9: LSSD structures. C/L Z X C A L1 L1LSSD SIDouble-latch L2 L2 L1 L2 SO C/L Z X C A SI Single-latch L1 L1 LSSD L2 L2 L1 L2 SO B B LSSD design rules are summarized as follows: Internal storage elements must be polarity-hold latches. Latches can be controlled by 2 or more nonoverlapping clocks that satisfy: A latch X may feed the data port of another latch Y iff the clock that sets the data into Y does not clock X. A latch X may gate a clock C to produce a gated clock Cg, which drives another latch Y iff Cg, or any other clock C’g produced from Cg, does not clock X. There must exist a set of clock primary inputs from which the clock inputs to all SRLs are controlled ither through (1) single-clock distribution tree or (2) logic that is gated by SRLs and/or nonclock primary inputs. In addition, the following conditions must hold: All clock inputs to SRLs must be OFF when clock PIs are OFF. Any SRL clock input must be controlled from one or more clock PIs. No clock can be ANDed with either the true or the complement of another clock. Clock PIs cannot feed the data inputs to latches, either directly or through combinational logic. Every system latch must be part of an SRL; each SRL must be part of some scan chain. A scan state exists under the following conditions: Each SRL or scan-out PO is a function of only the preceding SRL or scan-in PI in its scan chain during the scan operation. All clocks except the shift clocks are disabled at the SRL inputs. Any shift clock to an SRL can be truned ON or OFF by changing the corresponding clock PI. A network that satisfies rules - is level-sensitive. Race-free operation is guaranteed by rules .&. Rule allows a tester to turn off system clocks and use the shift clocks to force data into and out of the scan chain. Rules & are used to support scan. The advantages associated with LSSD are: 1. Correct operation independent of AC characteristics is guaranteed. 2. FSM is reduced to combinational logic as far as testing is concerned. 3. Hazards and races are eliminated, which simplifies test generation and fault simulation. There however are problems with LSSD (or previously discussed scan approaches): 1. Complex design rules are imposed on designers--no freedom to vary from the overall schemes, and higher design and hardware costs (4-20% more hardware and 4 extra pins). 2. No asynchronous designs are allowed. 3. Sequential routing of latches can introduce irregular structures. 4. Faults changing combinational function to sequential one may cause trouble, e.g., bridging and CMOS stuck-open faults. 5. Function to be tested has been changed into a quite different combinational one, so specification language will not be of any help. 6. Test application becomes a slow process, and normal-speed testing of the entire test sequence is impossible. 7. It is not good for memory intensive designs. 3.4 Random Access This approach uses addressable latches whose addressing scheme is similar to high-density memory addressing, i.e., an address decoder is needed. It provides random access to FFs via multiplexing--address selection. The approach was developed by Fujitsu [Ando, 1980], and was used by Fujitsu, Amdahl, and TI. Its overall structure is shown in Fig. 10. C/L X C SI L1 L1 Z L1 L1 addr decoder Figure 10: The Random Access structure and its cell design. The difference between this approach and the previous ones is that the state vector can now be accessed in a random sequence. Since neighboring patterns can be arranged so that they differ in only a few bits, and only a few response bits need to be observed, the test application time can be reduced. Also, it has minimal impact on the normal paths, so the performance penalty is minimized. Another advantage of this approach is that it provides the ability to `watch' a node in normal DI CKI SI +L CK2 Addr SO (C = CK1 & CK2) operation mode, which is impossible with previous scan methods. The major disadvantage of the approach is that it needs an address decoder, thus the hardware overhead (chip area and pin count) is high. As a summary, for all scan techniques, 1) test patterns still need to be computed by ATPG; 2) test patterns must be stored externally, and responses must be stored and evaluated, so large (non-portable) test fixture are still required. Therefore, there is a growing interest in built-in self-test (BIST). A typical CMOS scan cell design is shown in Fig.11. C DI SI MUX Latch on FF Q SO B Q DI SI SO A Figure 11: A typical CMOS scan cell. Problems 1. For the positive-edge triggered DFF with Reset discussed in the text, formulate CO(C) and SO(C). 2. Consider the two-bit binary adder consisting of two full-adder cells as shown below. (a) Calculate CC0(C1)and CC1(C1). (b) Discuss syndrome testabilities for a/0, a/1, b/0, and b/1, respectively. (c) Use PODEM to derive a test for a/0. (d) The dotted box defines a full adder cell, i.e., a single-bit adder stage. The same architecture obviously can be used to construct an n-bit binary adder for any positive integer n by copying the cells and extending to the left of the array. Show that for any adder so constructed, 8 test patterns are sufficient for detecting all single stuck faults (independent of n). 3. Calculate the syndromes for the carry and sum outputs of a full adder cell. Determine whether there is any single stuck fault on any input for which one of the outputs is syndrome-untestable. If there is, suggest an implementation--possibly with added inputs--which makes the cell syndrome-testable. 4. Consider the random-access scan architecture. How would you organize the test data to minimize the total test time? Describe a simple heuristic for ordering these data. 5. We have seen the application of scan chains to FSMs. They can be applied to combinational networks too. A combinational network can be represented with a directed acyclic graph (DAG), where edges stand for signals and nodes for computational elements. Consider the DAG shown below. (a) Show that we can assign scan latches to the edges so that |T|=8, and all nodes will be tested pseudo-exhaustively. (b) What is the minimum number of latches to fulfill the task? Project Cellular multipliers may be classified by the format in which data words are accessed, namely serial form and parallel form. The choice lies more or less in speed (or throughput) and silicon area, which are the major factors contributing to the performance and cost of the circuit. Bit-serial multipliers can further be divided into bit-sequential ones and serial-parallel ones. A bit-sequential multiplier accepts its operands bit by bit; while a serial-parallel one takes one input in serial and the other in parallel. Both types produce outputs in series. They have about the same area (hardware) and time complexities. Design a 4-bit unsigned bit-serial integer multiplier in either bit-sequential or serial-parallel form. Give the block diagram of the multiplier, which is an array of four basic cells. Give the schematic of the cell, which consists of a full adder and a few primitive gates, flip-flops, and latches. Explain how the multiplier works. Enter your design into a CAD environment, using, e.g., Verilog, VHDL, or any schematic capture tool. Verify your design by your functional patterns until you are confident that the design is correct. Now use a fault simulator to derive the fault coverage of your functional patterns. Discuss the results. If your functional patterns do not achieve a fault coverage of more than 90%, use a sequential ATPG to make up the difference. Can you improve the stuck-at fault coverage to 100% (excluding redundant faults)? Repeat the above experiment, but now use MUX scan approach in your design. Discuss the difference. Repeat the above two experiments for 8-bit and 16-bit multipliers. Draw a conclusion on your experiments.