Structural Delay Testing of Latch-based High

advertisement

Structural Delay Testing of Latch-based High-speed Pipelines with Time Borrowing

Abstract

High-speed circuits use latch-based pipelines in some of their most delay-critical parts. The use of latches not only allows attainment of high clock rate but also enables attainment of high yield at desired clock rate by permitting unintentional time borrowing.

In this paper, we first demonstrate that none of the existing design-for-testability (DFT) techniques can be used to simplify delay testing of such circuits. We then demonstrate that this leads to very high test generation and test application times. In many circuits, very low path delay fault coverage is obtained. We then propose a systematic test approach and associated DFT that significantly reduces the test generation and test application costs, and, for many circuits, significantly increases path delay fault coverage.

1. Introduction

Kun Young Chung and Sandeep K. Gupta

Electrical Engineering – Systems

University of Southern California

robustness of a design and hence improve yield at desired clock rate.

Many high-speed circuits are highly pipelined and use latches as opposed to flip-flops. They may use overlapped clocks and dynamic logic, such as domino circuits. High degree of pipelining helps achieve high clock rate. When appropriately used in conjunction with suitable clocks as in skew-tolerant circuits [1], latches can reduce the impact of pipelining and clock skew.

The use of latches also permits time borrowing across consecutive combinational blocks in a manner that will be elaborated later. When time borrowing is intentionally used during design, it provides greater flexibility in partitioning logic into different blocks. This may help reduce the number of latches required. It may also allow the logic to be partitioned into functionally meaningful blocks, reducing the chances of design errors.

Changes in values of parameters due to variations and/or spot defects during fabrication can cause some blocks in some fabricated copies of a circuit to borrow time from other blocks to correctly operate at desired speed. In this manner, unintentional time borrowing can improve the

Due to the above characteristics such designs are commonly used in applications where custom design is the only way to achieve desired performance. However, such designs pose new challenges in several areas of delay testing including DFT, test generation, and test application.

The objective of this research is to address the unique test problems posed by such high-speed circuits.

In this paper, we describe a new structural approach to test path delay faults in latch-based high-speed circuits.

2. Challenges in Delay Testing of Latch-based

Pipelines

A simplified latch-based high-speed pipeline can be modeled as shown in Figure 1. In Figure 1, C combinational logic blocks and L j i

’s are

’s are latches. Every latch is assumed to be a positive D-latch (i.e., it becomes transparent when the corresponding clock is high). Even though the figure shows complementary clocks for simplicity, any type of clocks may be used, including two-phase non-overlapping, four-phase non-overlapping, or four-phase overlapping [1]. To simplify the discussion, all latches are assumed to be ideal latches where all delays as well as the setup and hold times are zero.

However, our approach for test development takes into account the characteristics of real latches. Also, the characteristics of real latches are explicitly considered during the detailed design of DFT circuitry.

2.1 Operation of latch-based circuits

Assume C

1

is the first combinational logic block of a high-speed pipeline. We consider the inputs to the latches driving C

1

as primary inputs, and assume that a new combination of values is applied at the inputs of block C

1

at the rising edge of its driving clock , i.e., the clock controlling the latches at its inputs. In Figure 1, this is the rising edge of clock φ at time t

1

. The values at the

* This research is supported by National Science Foundation Award CCR-0204414

ITC INTERNATIONAL TEST CONFERENCE

0-7803-8106-8/03 $17.00 Copyright 2003 IEEE

Paper 42.1

1089

outputs of C

1

must stabilize some time before the subsequent falling edge of the block’s receiving clock , i.e., the clock controlling the latches at its outputs. For C

1

in

Figure 1, this is the subsequent falling edge of clock φ at time t

4

. Otherwise, correct values cannot propagate via latches at the outputs of the block to the inputs of the next block, C

2

, and a delay fault exists at the given clock rate.

If the values at the outputs of a block C

1

stabilize before the subsequent rising edge of its receiving clock, i.e., the subsequent rising edge of φ at t

2

, then any change in values is passed to the inputs of the next block, C

2

, only after the rising edge of the block’s receiving clock, φ . If the values at the outputs of a block stabilize after the subsequent rising edge of its receiving clock, φ , but before the clock’s falling edge, then any change in values passes immediately via the latches, and thus to the inputs of the next block. Hence, a new combination of values may be applied at the inputs of C

2

as early as the subsequent rising edge of its driving clock, φ , or as late as this clock’s subsequent falling edge. The values at the outputs of block C

2

must stabilize some time before the subsequent falling edge of its receiving clock, φ , and so on.

Hence, unlike in a flip-flop based circuit, values at the inputs of a block in a latch-based circuit may not be applied at a specific time in a correctly functioning circuit.

Nor may the corresponding response values become available at its outputs at a specific time. This allows a block to borrow time from others in its fan-in/fan-out.

Output of C

L

L

L

1

2

3

C

1 t

L

L

5

L

6

4

C

2 t t t t t t t

Figure 1 An example latch-based linear pipeline

2.2 Reference times and nominal delays

L

L

8

L

9

7

Without any loss of generality, a reference time is defined as the earliest time at which new values may be applied at the inputs of a block, namely the rising edge of the block’s driving clock. The corresponding reference

Paper 42.1

1090

3

- t

2 time for the responses at the outputs of a block is the rising edge of the block’s receiving clock. In general, the reference times at the inputs and the outputs of a block are the clock edges at which the latches at its inputs and outputs become transparent. The nominal delay for a block is defined as the time difference between the reference times at its inputs and outputs. For each block in Figure 1, the nominal delay is half of the clock cycle, i.e., T/2 , where T is the period of the clock. If transitions at inputs and outputs of each block of a circuit satisfy these reference times, the circuit will operate at the desired clock rate. Such a circuit can be viewed as a nominal circuit where no time is being borrowed by any block.

2.3 Types of time borrowing and their advantages

Now consider a scenario where the values at the inputs of C

2

do not arrive before t

2

, the rising edge of φ , and arrive at t

3

as shown in Figure 1. In this case, we say that C

1

is borrowing time from C

2

. The time period t

3

– t

2 in Figure 1 denotes the amount of time borrowed.

Similarly, if the outputs of C

2 rising edge of φ (i.e., t

4

do not stabilize before the

) but do so shortly thereafter, then

C

2 is said to borrow time from the block in its fan-out. In general, C

2

may borrow time from blocks in its fan-out to accommodate its own large delay and/or to compensate for the time it lent to C

1

.

Time borrowing/lending may be intentional in the sense that it may be planned during the design of a circuit.

Even when time borrowing is not planned during the design of a circuit, it may occur unintentionally if variations and/or defects during fabrication cause such borrowing in some fabricated copies of the circuit. For a circuit design, the latches that are sites of intentional time borrowing are known prior to DFT design, test development, and test application. However, the precise amount of time borrowed for any particular vector is not known a priori , since it depends on the variations in the fabrication process. On the other hand, latches that are sites of unintentional time borrowing may vary from one fabricated chip to another and hence are not known prior to test application.

Note that even when time borrowing occurs unintentionally, the circuit is fault-free at given clock rate provided that the values at outputs of every time borrowing logic block stabilize before the subsequent falling edge of the block’s receiving clock.

2.4 Key difficulties in delay testing of circuits with time borrowing

In a flip-flop based circuit, the transition at the output of a path in one block is latched into the corresponding flip-flop at a specific clock edge before it

begins to propagate via a path in the next block. Hence, if the delay of the path in the first block is excessive, the transition misses the clock edge and cannot be seen by the path in the next block. Also, if the delay of the path in the first block is short, the transition at the input of a path in the next block still starts after the appropriate clock edge. Thus, in a flip-flop based circuit, path delay faults in one combinational logic block can be treated independently of the path delay faults in the adjacent blocks.

In contrast, in latch-based circuits that intentionally or unintentionally borrow time, it becomes necessary to consider delay faults in multi-block paths , i.e., paths obtained by concatenating appropriate paths in successive logic blocks separated by latches. This occurs because transitions propagating via a path in one block can continue to propagate via already transparent latches to the paths in the next block. In this manner, in a latch based circuit with extreme scenarios of time borrowing, long paths that pass via all blocks in the circuit may need to be targeted for delay testing. Since many latch-based parts of circuits (e.g., data-paths) contain astronomical number of such multi-block paths [2], this typically results in impractically high test generation complexity as well as high test application time. Furthermore, for some circuits, delay fault coverage obtained is so low that it becomes meaningless as a measure of test quality. To remedy this situation, it is imperative to incorporate DFT features within such blocks to reduce test generation and application times and provide meaningful measures of test quality.

However, as described above, values are not applied at the inputs of a combinational block at a specific time.

This is true even for circuits with no intentional time borrowing, since unintentional time borrowing may delay the arrival of values at some/all inputs of the circuitunder-test ( CUT ), i.e., the particular copy of the circuit that is currently being tested. This raises unique problems for test generation and design of DFT circuits used to apply vectors. Also, the values at the outputs of a block are not available for sampling at a specific time. This raises unique problems for observation of test responses using DFT circuitry. It is important to note that simply applying tests at the inputs of a block and observing responses at its outputs at nominal times will cause many fault-free chips to be unnecessarily discarded. In fact, in circuits where time borrowing has been intentionally exploited, this can lead to zero yield at given clock rate.

No known DFT at inputs and outputs of a block can replicate skewed test application and response capture for path delay testing of the block because of the following reasons. First, while location of intentional time borrowing is known for a given design, the exact amount of time borrowing varies from vector to vector and from one fabricated copy of chip to another. Second, unintentional time borrowing may occur due to normal process variations as well as spot defects that occur during fabrication, and its exact amount also varies from one fabricated copy of the chip to another in an unpredictable manner. Third, it is prohibitively expensive, if not impossible, to control a scan chain to apply values at different inputs at desired times. In the absence of DFT support, path delay fault testing of such parts of circuits suffers from excessive test generation and test application times and, for many circuits, abysmally low fault coverage.

We overcome these limitations and provide a new structural delay testing approach for latch-based highspeed pipelines using scan that applies tests and captures responses at clock edges only.

2.5 Benefits of scan design-for-testability (DFT)

First, appropriate use of scan DFT reduces the number of path delay faults ( PDFs ) to be targeted.

Consider a latch between two blocks of logic, say C a

and

C b

. Let m paths via C via C b a

terminate at the latch and n paths

originate from the latch. Then there exist m × n physical paths in the above two blocks that pass via the latch. Since, for each physical path, two path delay faults—one with rising transition at its input and one with falling—must be targeted; a total of m × n × 2 PDFs that pass via the two blocks as well as the latch must be targeted. If one can verify, during test, that no time borrowing, intentionally or unintentionally, occurs at this latch, paths in C a and C b

can be targeted separately. In such a case, the total number of PDFs corresponding to the latch that must be targeted drops from m × n × 2 to

2 (m+n) . Since m and n are typically large, the use of scan reduces the total number of target PDFs. Note that even greater reductions occur when one considers the above arguments for multi-block paths that pass via a larger number of blocks.

Second, the average length of a targeted path is shortened. It is not always possible to propagate a transition robustly along a path, since sometimes conflicting logic values are required at side inputs of the path for robust propagation. As the length of a target path increases, the possibility of a conflict between values required at side inputs also increases. The use of scan at latches where no time borrowing occurs reduces the average length of paths and hence, in many circuits, enhances path delay fault coverage.

Paper 42.1

1091

3. Key Ideas behind the New Structural

Approach for Delay Testing

Initially, we assume that every latch is multifunctional in that it can operate in normal and scanin modes (test applied at a rising edge of corresponding clock), and can capture responses at rising or falling edge, as desired. In total, a latch can operate in the following four modes.

(1) Normal mode : The latch is transparent when the corresponding clock is high.

(2) Scan-in mode : Vectors are loaded and applied at the rising edge of the corresponding clock.

(3) r-capture scan-out mode : The latch captures response at the rising edge of the corresponding clock for scan out.

(4) f-capture scan-out mode : The latch captures response at the falling edge of the corresponding clock for scan out.

Time borrowing is assumed not to occur at the

“primary” inputs and outputs of the entire latch-based circuit. This is very often the case because high-speed latch-based pipelines are typically embedded in larger circuits that are otherwise flip-flop based.

The proposed approach is based on two types of tests described next. Note that we may test blocks individually.

Alternatively, we may test any set of contiguous blocks together as a single entity. In either case, we use the term sub-circuit under test (SCUT) to describe the block(s) under test.

3.1 r-f test application (necessary condition on block delays): TA(r,f) tests

Under this test application, tests are applied to every sub-circuit under test ( SCUT) at the rising edge of the

SCUT ’s driving clock and the responses are captured at the SCUT outputs at the falling edge of its receiving clock.

Under the two-phase clocks shown in Figure 1, a combinational block in the pipeline can borrow up to one half of the clock cycle from the subsequent block, resulting in maximum allowable delay of T . This concept can be generalized to any SCUT with multiple blocks.

Let TA(r,f) denote the maximum time allowable for any multi-block path in a given SCUT . In particular, for an

SCUT comprised of k consecutive blocks in a linear pipeline ( C i

1

, C i

2 can be written as

, L , C i k

), the maximum delay allowable k

l

( ∆ C i l

) ≤ k ⋅

T

2

+

T

2

≡ TA ( r , f ), (1) where

C i l

∆ C i l is the maximum delay of any path in block

.

If this condition is violated for even one SCUT , the

Paper 42.1

1092 entire CUT is proven to have a delay fault at the clock rate T . Note that TA(r,f) test application allows the maximum period of time for transitions to propagate via each path in an SCUT . Hence, it is necessary for every block to pass TA(r,f) tests.

Theorem 1 : If any SCUT in a circuit fails its TA(r,f) tests at clock rate T , then the circuit has delay faults at that clock rate that cannot be compensated via time borrowing.

3.2 r-r test application (sufficient condition on block delays): TA(r,r) tests

Under this test application, tests are applied to the inputs of a SCUT at the rising edge of the SCUT ’s driving clock and the responses are captured at the SCUT’s outputs at the rising edge of the receiving clock. The time interval TA(r,r) denotes the nominal time allocated to the

SCUT . k

l

( ∆ C i l

) ≤ k ⋅

T

2

≡ TA ( r , r ).

(2)

If this condition is violated for a latch at the output of the

SCUT , the SCUT is known to borrow time from the next stage. In particular, we have the following result.

Theorem 2 : If every block in a circuit passes TA(r,r) tests at clock rate T , there is no delay fault and no time borrowing in the CUT at that clock rate.

Note that one or more SCUT may fail TA(r,r) due to time borrowing but the circuit may not have a delay fault. The following result generalizes the above.

Theorem 3 : If a CUT can be partitioned into a set of

SCUTs such that each SCUT passes corresponding

TA(r,r) tests at clock rate T , then the CUT is free of delay faults at that clock rate.

4. Proposed Test Approach

4.1 An adaptive test approach

Now we describe our test methodology where test vectors are applied and responses captured only at clock edges. To simplify the discussion, we describe the approach for a linear pipeline with blocks C

1,

C

2,

C

3

, , and so on. However, as described in [3], the approach is general and can be applied to circuits where blocks are connected in arbitrary ways.

First consider a scenario where, based on circuit design, we expect extreme time borrowing , i.e., where the delay of one or more single block and/or one or more multiple block combination is likely to exceed the maximum possible delay allowed given by the relation described earlier for TA(r,f) . In this case, we consider each single block or multiple block combination where

extreme timing borrowing is deemed likely to occur as a sub-circuit under test ( SCUT ) and apply suitable tests to the SCUT using the TA(r,f) test application. Clearly, if any of these SCUTs fails any of its TA(r,f) tests, then the entire CUT is identified as having a delay fault at the desired clock period and delay testing can be terminated.

Next consider a scenario where a chip under test has either skipped above TA(r,f) tests (since extreme time borrowing was not expected) or has passed all the TA(r,f) tests applied. Now TA(r,r) tests are applied adaptively in the following manner. In the first step, the first block of logic in the pipeline, i.e., C under test, say SCUT

1

1

, is considered as a sub-circuit

. During testing of SCUT

1

, all latches at the inputs of C

1

are configured in scan-in mode, and those at the outputs in r-capture scan-out mode, as shown in Figure 2(a). Every latch at the output of SCUT

1 at which erroneous response is captured for any vector during TA(r,r) testing of SCUT

1

, is identified as a site of time borrowing.

Figure 2 Proposed procedure for SCUT forward expansion illustrated for four possible scenarios

Once the testing of construct SCUT adding to SCUT i i+1

SCUT i

is completed, we

as a forward expansion of SCUT general, a forward expansion of SCUT i i

. In

is obtained by

the block of logic in its fan-out. For the

shown in Figure 2(a), a forward expansion is SCUT

1 comprised of a sub-circuit that also includes the combinational logic block configuration of SCUT i+1 latches at outputs of SCUT

C

2

. However, the exact

varies depending upon the time borrowing during i

that are identified as sites of

TA(r,r) testing of SCUT i

.

In particular, during TA(r,r) testing of SCUT i+1 latches at outputs of SCUT i

, all

that are identified as sites of time borrowing are configured in normal mode, while the latches at the other outputs of SCUT i

are configured in scan-in mode. For the example SCUT

1

in Figure 2(a), if time borrowing is known to occur across the latch at its top output only, then SCUT

2

is configured as shown in

Figure 2(d). If time borrowing is found to occur at both latches at the outputs of SCUT

1

, then SCUT in Figure 2(f).

2

is as shown

Our special case of SCUT i

forward expansion must be noted. If during testing of SCUT i

, it is determined that time borrowing does not occur across any of its outputs, then SCUT i+1

is simply the next block in the pipeline.

This is illustrated in Figure 2(b) for the example SCUT

1

in

Figure 2(a).

The proposed test approach continues in this manner until every block in the given circuit is included as a part of some SCUT that is tested using TA(r,r) tests.

4.2 Fault coverage computation for TA(r,r) tests

Since a CUT is divided into a number of disjoint and/or partly inclusive SCUTs , where each SCUT is tested separately, fault coverage must be computed appropriately. For example, consider a latch at which time borrowing does not occur. Let k

1

and k

2

denote the number of PDFs in transitive fan-in and transitive fan-out of the latch, respectively. Note that k i

2 of these are

PDFs with rising transition at the latch and k i

2 with falling transition (for i= 1, 2). Now consider the PDFs in the transitive fan-in of the latch that are robustly tested during TA(r,r) testing of SCUT i

. Let r

1

and f

1

denote the numbers of PDFs that cause rising and falling transitions at the inputs of the latch, respectively, and are tested robustly.

In the next SCUT , say SCUT i+1

, the numbers of

PDFs in the transitive fan-out of the latch that are robustly tested are also counted. Let r

2

and f

2

denote the numbers of robustly tested PDFs in the transitive fan-out of the latch with rising and falling transitions at the output of the latch, respectively.

Hence a total of r

1 r

2

+ f

1 f

2 multi-block PDFs across the latch (at which no time borrowing occurs) are implicitly tested. In particular, if a CUT consists of only two stages, one before and one after the latch, the robust

PDF coverage for paths passing via the latch at which no time borrowing occurs is defined as r

1

(

⋅ k r

2

1

+ k

2 f

1

) /

2 f

2 , (3)

Paper 42.1

1093

where ( k

1

⋅ k

2

) 2 is the total number of two-block PDFs that pass via the latch. If latches are identified as sites of time borrowing, it is more straightforward to compute

PDF coverage for multi-block paths that pass via the latches. The overall PDF coverage can be obtained as an appropriate weighted sum of above coverages.

5. Experimental Results

We first apply the proposed approach to the twostage linear pipeline shown in Figure 3 that uses two copies of the circuit C17 from the ISCAS ’85 benchmark suite. Subsequently, we consider versions of this pipeline with larger numbers of stages as well as other pipelines, to further demonstrate the benefits of the proposed approach.

5.1 Testing the entire CUT —the classical approach

Since the CUT is acyclic and balanced [4], ordinary test generation method for path delay testing of combinational circuits may be applied regarding it as purely combinational. In this case, x inputs, and z

1

and z

2

1

to x

8

are primary

are primary outputs of CUT . Below is a summary of test generation results for path delay faults:

• Total number of target path delay faults: 70,

• Number of tests: 48,

• Number of path delay faults robustly tested: 59, and

• Robust path delay fault coverage: 84.3 %.

Note that even for such a small two-stage circuit, fault coverage is less than 85%. As the number of stages increases, achievable fault coverage tends to drop rather significantly. Next we describe the test applications under the proposed approach.

5.2 Testing under the proposed approach

In this as well as all the following examples we will assume that no extreme time borrowing was deemed likely and hence only TA(r,r) tests are applied.

5.2.1 Test block C17_1 only (SCUT

1

)

As shown in Figure 4, only the first stage (shaded) becomes the current sub-circuit under test, SCUT

1

, where primary inputs are x

and L

4

1

to x

5

, and outputs are the signals

. To apply a test, L

2

, L

7

, L going into L

3 will operate in scan-in mode. TA(r,r) tests are performed by capturing the responses at L of

3

and L

4

1

, L

8

, and L

9

at the rising edges

φ

2

. Below is a summary of test generation results for path delay faults in SCUT

1

:

• Total number of target PDFs within SCUT

1

: 22,

• Number of tests: 15,

• Number of path delay faults robustly tested: 22, and

• Robust path delay fault coverage: 100 %.

Let us consider the case where during the above

TA(r,r) testing, fault-free response is captured at L

4

for each test, but faulty response is captured at L

3

for some tests. This indicates intentional/unintentional time borrowing via L

3

.

Figure 3 A two-stage linear pipeline comprised of two copies of C17

L

L

L

L

, L , L and L

, L , and L operate in operate in scan-in r-capture scan-out mode.

mode.

Figure 4 Testing the first stage only ( SCUT

1

)

C17_1 top

L , L , L , L

Note that L

L and L

, L , L , L , and L operate in scan-in mode.

operates in normal mode.

operate in r-capture scan-out mode.

Figure 5 Testing the forward expansion ( SCUT

2

)

5.2.2 Testing the forward expansion (SCUT

2

)

SCUT

2

is comprised of SCUT

1

as well as the block in its fan-out, namely C17_2. However, since TA(r,r)

Paper 42.1

1094

testing of SCUT not L

4 in of L

, L

3

1

revealed time borrowing across

4

within SCUT

1

is excluded from SCUT shown in Figure 5, L

1

, L

2

, L

7

, L

8

, L

4

, L

10

, L

2

. Also, as

11

L

3

is configured in normal mode and L

but

4

is configured in scan-in mode; therefore, the transitive fan-

, and L

12 operate in scan-in mode to load and apply tests. Note that each latch in scan-in mode applies tests at the rising edge of the corresponding clock. Figure 5 also shows, via shading of logic blocks C17_1 and C17_2, the paths that are targeted during testing of SCUT

2

. Below is a summary of test generation results for path delay faults in

SCUT

2

: faults is 874 and 55.6% fault coverage is obtained.

However, using the proposed approach, 100% fault coverage is achieved with significantly fewer tests. Note that, in this experiment, time borrowing is assumed to occur at the top output of each of the first four blocks in the pipeline. Table 2 summarizes the results.

5.3 Pipelined adder

Next, an n -stage latch-based pipelined ripple-carry adder is studied to show the benefits of the proposed approach. Table 3 summarizes the results for the classical approach for several values of n .

• Total number of target PDFs within

• Number of tests: 19,

SCUT

2

: 30,

• Number of path delay faults robustly tested: 30, and

The classical approach cannot exploit any DFT and hence generates tests considering the entire pipeline as a single circuit. As the number of pipeline stages increases, test generation time as well as the number of tests

• Robust path delay fault coverage: 100 %. not only improves the fault coverage but also reduces the total number of tests required for testing. Detailed comparison for the two-stage pipeline is shown in Table 1. increase exponentially (Table 3) [5]. In this particular example, 100% coverage is obtained.

By identifying that time borrowing does not occur across L

4

and thereby scanning L

4

, the proposed approach enhances the coverage to 100%, from 84.3% provided by the classical approach. Note that the proposed approach

Table 1 Comparison between the classical approach and the proposed approach for the two-stage pipeline using copies of C17

Total number of tests

Classical approach

48

Proposed approach

34 ( = 15 + 19);

29% reduction

Table 3 Test generation using the classical approach for

No. of stages pipelined ripple-carry adders

No. of tests

No. of target PDFs

Robust PDF coverage

1 26 32 100%

2 53 86 100%

3 94 164 100%

4 148 266 100%

5 233 392 100%

6 324 542 100%

7 427 716 100%

8 583 914 100%

9 665 1136 100%

10 816 1382 100%

Total number of target path delay faults

Robust path delay fault coverage

Test approach

Proposed approach

SCUT

SCUT

SCUT

3

Total

1

2

No. of tests

70

53 ( = 22 + 30);

24% reduction

84.3% 100%

Table 2 A summary of test results for the example 5-stage pipeline using copies of C17

15

19

No. of target

PDFs in each SCUT

Robust

PDF coverage

22 100%

5.3.1 Only unintentional time borrowing

Even if the pipelined adder is designed without any intentional time borrowing, unintentional time borrowing may occur at some latches in some fabricated copies of the circuit. In this particular example the classical approach achieves 100% fault coverage. However, the proposed approach can exploit DFT to improve test costs significantly. Figure 6 shows several possible scenarios of time borrowing for a 10-stage pipeline and the results

SCUT

4

28

35

SCUT

5

46

30 100%

38 100%

46 100%

54 100% of the proposed approach for some of these unintentional time borrowing scenarios.

143 190 100%

Classical approach 412 874 55.6%

The probability of occurrence of any particular time

5.2.3 Benefits of the proposed approach for larger circuits

Next, consider a 5-stage pipeline constructed in the same manner as in the above example. In this case, for the classical approach, the number of target path delay latch with a probability p , independent of whether time borrowing occurs at any other latches. Since time borrowing is not intentional, the value of p is generally close to 0. The dotted circles in Figure 6 show the largest

SCUTs encountered at various stages of TA(r,r) testing for each scenario of time borrowing.

Paper 42.1

1095

If no time borrowing occurs, as in the first scenario, the proposed test procedure ends up testing each stage separately. Thus, the total number of tests decreases from

816 for the classical approach to 260 (=26 x 10) while maintaining the same fault coverage. The probability of occurrence of this scenario is very high if p is small.

Hence, for a vast majority of fabricated chips, the proposed approach will be able to provide the above level of reduction in test application time. Even in scenarios where time borrowing occurs at some latches, such as

Scenario 2 shown in this figure, the proposed approach reduces test application time.

The proposed approach becomes worse than the classical approach only when time borrowing occurs at many latches, as in the Scenario 3 and Scenario 4 in this figure. However, for a small value of p , the probability of occurrence of such scenarios is very low.

Hence, even in circuits where the classical approach provides 100% fault coverage, the proposed approach helps significantly reduce the average test application costs even in the presence of unintentional time borrowing.

FA FA FA FA FA FA FA FA FA FA

♦ Scenario 1 : No time borrowing

≈ 1(very high), Total no. of tests: 260, Coverage: 100%

FA FA FA FA FA FA

♦ Scenario 2 : 2 latches with time borrowing

FA FA FA FA

FA FA FA FA FA FA FA FA FA FA

♦ Scenario 3 : 7 latches with time borrowing as shown

(1–p) 2 p 7 (low), Total no. of tests: 1410*, Coverage: 100%

FA FA FA FA FA FA FA FA FA FA

♦ Scenario 4 : All 9 latches with time borrowing

•Prob.= p 9 ≈ 0 (very low), Total no. of tests: 3369, Coverage: 100% approach, in the absence of any further unintentional time borrowing, the biggest SCUT blocks will consist of two consecutive adder stages. In this case, the proposed approach will require only 395 (=(26+53) x 5) tests to test a 10-stage pipelined adder with the same fault coverage

(100%) as the classical approach.

5.4 A third case study – Pipelined minimum vector selector (MIN)

Minimum vector selector (MIN) is a circuit that determines the minimum of two input vectors. Each stage has two primary inputs, two latched inputs (outputs of the previous stage), and two latched outputs (inputs to the next stage), as shown in Figure 7. To simplify the analysis, we modify the MIN circuit by ignoring the primary output of each stage. Table 4 summarizes the results for the classical approach for the modified MIN for various pipeline depths. The original MIN circuit demonstrates exactly the same trend as shown for modified MIN in Table 4.

Table 4 Test generation using the classical approach for

No. of stages pipelined modified MIN

No. of tests

No. of target PDFs

Robust PDF coverage

1 13 20 80.00%

2 27 68 52.94%

3 52 212

4 82 644

1940 7.42%

5828 3.36%

7 216 17492

8 274

157460

10 419

Due to its inability to exploit DFT, the classical approach performs test generation for the entire CUT . As shown in Table 4, as the number of pipeline stages increases, the numbers of tests and the number of path delay faults that need to be targeted increase rapidly while the robust path delay fault coverage falls rapidly (also see

[5]). For a 10-stage pipeline, 472388 PDFs must be targeted, of which only 0.10% can be robustly tested by applying 419 tests. Since the tests quality is so poor, tests generated by the classical approach are almost useless.

In a manner similar to that described in Section 5.3, several scenarios are considered for a 5-stage pipelined

MIN, to illustrate the benefits of the proposed approach in terms of reduction in the number of test vectors as well as improvement in fault coverage. The results are presented in Table 5.

As described for pipelined ripple carry adder, time borrowing occurs at few latches in most fabricated copies of the chips, i.e., scenarios 1 and 2 are much more likely.

Hence, for this pipeline, the proposed approach exploits

Paper 42.1

1096

Figure 6 A 10-stage pipelined ripple-carry adder

– four scenarios of time borrowing

5.3.2 Intentional time borrowing

In an effort to minimize worst-case delay for a ripple-carry adder, the inverter at the output of the carry gate can be omitted so that every other stage operates on complemented data [6]. In this case, clock period may be reduced by intentionally allowing even stages to borrow time from the subsequent odd stages. In the proposed

DFT to provide higher average fault coverage at lower average test application cost than the classical approach. such circuits. We then demonstrate that this leads to very high test generation and test application times and, in many circuits, to very low path delay fault coverage. We then describe a systematic test approach and associated

DFT that significantly reduces test application costs and, more importantly, in many pipelines, helps attain high fault coverage.

Figure 7 A pipelined 5-stage MIN

8. References

Table 5 Results of the proposed approach for the 5-stage modified MIN under different time borrowing scenarios

Latch

Scenario 1

Scenario 2

Scenario 3

Scenario 4 L

Scenario 5 L

1 with TB

None

L

L

1

1

L

L

2

1

4

Scenario 6 All L

1

L

L

L

L

5

2

L

6

to L

8

6. Ongoing Research

No. of tests

Robust PDF coverage (%)

6

65 19.38

79 19.38

5

79 17.73

L

8

145 19.38

93 14.02

292 7.42

[1] D. Harris, “Skew-Tolerance Circuit Design”,

Academic Press, San Diego CA, 2001.

[2] N.M. Abdulrazzaq, “Performance Testing of Data-

Path Circuits”, Ph.D. Dissertation , Department of

Electrical Engineering – Systems, University of

Southern California, 2001.

[3] Kun Y. Chung and Sandeep K. Gupta, “Structural

Delay Testing of Latch-based High-speed Pipelines with Time Borrowing”, Technical Report CENG 03-

01, University of Southern California, 2003.

[4] R. Gupta, R. Gupta, and M.A. Breuer, “The

BALLAST Methodology for Structured Partial Scan

In the above experiments, it is implicitly assumed that all latches use DFT circuitry that enable them to

Design”, IEEE Trans. on Comptuers , 39(4), 1990.

[5] N.M. Abdulrazzaq and Sandeep K. Gupta, “Test function in all four modes of operation (normal, scan-in, r-capture scan-out, and f-capture scan-out). The mode

Generation for Path-Delay Faults in Onedimensional Iterative Logic Arrays”, Proceedings control of each test session ( SCUT i

) is determined at run time, based on the time borrowing sites identified during previous test session ( SCUT i-1

). This type of mode control will be called fully-adaptive . The fully-adaptive

IEEE International Test Conference , 2000.

[6] Neil H.E. Weste and K. Eshraghian, Principles of

CMOS VLSI Design: a systems perspective, 2 nd Ed.

Additon Wesley, 1992. control maximizes fault coverage and minimizes test application cost. It can also detect any unintentional time borrowing on a latch and adjust the operation of the latch accordingly. However, the DFT overheads for the fullyadaptive approach are high. We are currently investigating a partially-adaptive approach where the

DFT overheads will be decreased significantly, without sacrificing most of the benefits of the fully-adaptive approach. We are also developing an approach to identify the SCUTs to be tested so as to minimize the average test application costs. We are developing a flexible ATPG tool that can perform test generation for all possible (or, all likely) scenarios described above. We are also carrying out detailed design of the DFT circuitry suitable for fully- as well as partially-adaptive approaches.

7. Conclusion

High-speed circuits use latch-based pipelines in some of their most delay-critical parts. The use of latches not only allows attainment of high clock rate but also enables attainment of high yield at desired clock rate via unintentional time borrowing.

In this paper, we first demonstrate that none of the existing DFT techniques can be used for delay testing of

Paper 42.1

1097

Download