Document 10414702

advertisement
1058
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
METER: MEasuring Test Effectiveness Regionally
Yen-Tzu Lin, Member, IEEE, and R. D. (Shawn) Blanton, Fellow, IEEE
Abstract—Researchers from both academia and industry continually propose new fault models and test metrics for coping
with the ever-changing failure mechanisms exhibited by scaling
fabrication processes. Understanding the relative effectiveness of
current and proposed metrics and models is vitally important for
selecting the best mix of methods for achieving a desired level
of quality at reasonable cost. Evaluating metrics and models
traditionally relies on actual test experiments, which is timeconsuming and expensive. To reduce the cost of evaluating
new test metrics, fault models, design-for-test techniques, and
others, this paper proposes a new approach, MEeasuring Test
Effectiveness Regionally (METER). METER exploits the readily
available test-measurement data that is generated from chip
failures. The approach does not require the generation and
application of new patterns but uses analysis results from existing
tests, which we show to be more than sufficient for performing a
thorough evaluation of any model or metric of interest. METER
is demonstrated by comparing several metrics and models that
include: 1) stuck-at; 2) N-detect; 3) PAN-detect (physically-aware
N-detect); 4) bridge fault models; and 5) the input pattern fault
model (also more recently referred to as the gate-exhaustive
metric). We also provide in-depth discussion on the advantages
and disadvantages of METER, and contrast its effectiveness with
those from the traditional approaches involving the test of actual
integrated circuits.
Index Terms—Fault models, test effectiveness, test evaluation,
test metrics.
I. Introduction
T
HE MAIN objective of manufacturing test is to separate
good chips from bad chips. Test methodologies continue
to evolve, however, to capture the changing characteristics of
chip failures, and new fault models and test metrics have been
developed to guide the test generation process. Here, we use
the phrase “fault model” in its classic sense, as an abstract
representation of the behavior that results from some type
of defect. A “test metric,” on the contrary, is not necessarily
meant to model defect behavior but instead is a way to evaluate
or measure the quality that a test set would presumably achieve
when applied to failing chips. The stuck-at fault model [1]
has been used as both a model and a metric, and has been
universally adopted as the basis of test generation because of
its simplicity and low cost.
Manuscript received August 26, 2010; revised November 21, 2010; accepted
January 7, 2011. Date of current version June 17, 2011. This work was
supported by the NSF, under Award CCF-0427382, the SRC, under Contract
1246.001, and an NVIDIA fellowship. This paper was recommended by
Associate Editor C.-W. Wu.
Y.-T. Lin is with NVIDIA Corporation, Santa Clara, CA 95050 USA
(e-mail: yenlin@nvidia.com).
R. D. (Shawn) Blanton is with the Center for Silicon System Implementation, Department of Electrical and Computer Engineering, Carnegie Mellon
University, Pittsburgh, PA 15213 USA (e-mail: blanton@ece.cmu.edu).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2011.2113670
Nevertheless, as manufacturing technology continues to
scale and design complexity increases, failure behaviors have
and continue to become more complicated and therefore harder
to characterize [2]. The behavior of even static defects (i.e.,
defects that have no sequence or timing dependency and thus
can be detected by a single test pattern) involves more complex
mechanisms that can no longer be sufficiently dealt with using
just the stuck-at fault model [3], [4]. Various fault models and
test metrics have been developed to ensure test quality, that
include, e.g., bridge [5], [6], transition [7], input-pattern fault
models [8], and test metrics such as gate exhaustive [9], [10],
bridge coverage estimate [12], N-detect [3], [11]–[13], and
physically-aware N-detect (PAN-detect) [14]–[16].
For all existing and newly developed test methods, it is
important to understand their relative effectiveness so that the
proper mix of test approaches can be identified for achieving
the required quality level at an acceptable cost. Traditionally,
test methods have been evaluated empirically. Specifically, experiments involving real integrated circuits (ICs) are conducted
to reveal defect characteristics and for assessing the capability
of various test and design-for-test (DFT) methods to uncover
chip failure. Unique fallouts (i.e., chip-failure detections),
typically shown in the form of a Venn diagram, are considered
to be good indicators of relative effectiveness.
Fig. 1 summarizes some real-chip experiments on test
evaluation that have appeared in the paper over the last 15
years [3], [10], [12], [13], [16]–[30].The y-axis shows various
process nodes and the x-axis is the time. Each circle indicates
the year that the work was published and the process node
used for fabricating the design in the experiment. The size
of the circle reflects the number of test methods evaluated.
Finally, the experiments conducted by the same organization
have the same color. Fig. 1 shows that the evaluation of fault
models and test metrics continues to be of significant interest.
Experiments involving real ICs, however, require a sufficiently large sample in order to produce statistically significant
results. When more test methods are compared, more tests
must be generated and applied in a production environment.
More often than not, generating tests for new, proposed models
or metrics is a significant challenge since the commercially
available test tools are typically hard-coded to handle only a
limited set (e.g., stuck-at, bridge, transition fault, and others).
Conducting real-chip experiments for test evaluation is therefore time-consuming and expensive. An evaluation approach
that is more economical, automatable, and effective is very
much desired.
In this paper, we introduce a general and cost-effective testmetric evaluation methodology, MEasuring Test Effectiveness
Regionally (METER), and show how it can be used to evaluate
c 2011 IEEE
0278-0070/$26.00 © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution
to servers or lists, or reuse of any copyrighted component of this work in other works.
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
Fig. 1.
Recent real-chip experiments on test evaluation.
a large variety of fault models and test metrics. METER
analyzes the failure log files that result from the application of
any set of test patterns, i.e., no additional tests are needed. The
cost of this method is therefore low since extra test generation
and test application are completely avoided. Finally, METER
is general since it can be used to evaluate any test metric, or
fault model, or DFT approach that is meaningful within the
environment used for collecting the test data.
The basic idea of METER is to identify the locations (or
regions) of the failure within the bad chip, and then evaluate
the region against the metric/model of concern using the tests
already applied. METER is not perfect, however, since it
relies on the identification of the failure within the bad chip
using diagnosis or other localization techniques. For failing
chips that have some ambiguity in the localization results,
the effectiveness measures must be statistically analyzed. This
shortcoming, however, also exists and is exacerbated in the
traditional evaluation approach. Specifically, tests that target
a particular fault model or test metric that detect a given
failing chip does not necessarily mean that the model or metric
“captures” defect behaviors. In other words, it is possible that
the chip failure is fortuitously caught by the applied tests
but not due to the targeted metric/model. METER instead
precisely addresses this problem by evaluating models or
metrics specifically for possible failure regions.
METER was first introduced in [16] and [31]. This paper
subsumes and extends existing work by:
1) defining new quantitative measures of the effectiveness
and efficiency of various models/metrics over different
products and technology nodes;
2) applying METER to large, industrial designs that include
an NVIDIA graphics processing unit (GPU) and an IBM
application-specific integrated circuit (ASIC);
3) showcasing how METER can be used to select parameters for automatic test pattern generation (ATPG).
The rest of this paper is organized as follows. Section II
provides background on test evaluation and describes related
work. The details of test metric/model evaluation methodology METER are described in Section III. Evaluation
results for several different test metrics are presented in
Section IV. Section V provides a discussion on the applicability of METER, and compares METER with the traditional
1059
Fig. 2. Coverage distribution achieved by a nearly 100% stuck-at test set for
various metrics/models.
tester-based approach. Finally, in Section VI, conclusions are
drawn.
II. Background
In this paper, we utilize the notion of fault coverage for
an individual line. For instance, a signal line has two stuckat faults and can have a coverage for some set of tests that
is equal to 0%, 50%, or 100%. For a line that has four
“close” neighbors, there are eight possible two-line bridge
faults, where it is assumed each neighbor can impose a faulty0 or faulty-1 value on the targeted line. The possible bridge
coverages for the line are described by the set {1/8 = 12.5%,
2/8 = 25%, . . ., 8/8 = 100%}. Finally, for a line driven by a
two-input gate, the possible gate-exhaustive coverages include
22 = 4 possibilities that lie in the set {0%, 25%, 50%, 100%}.
Sometimes, instead of reporting percentages, we will simply
list the number of detections for a given metric (as will be
seen later in Table II). Extending this notion to N-detect and
PAN-detect is a little more complicated since both usually refer
to one type of fault polarity (either stuck-at-0 or stuck-at-1).
Therefore, the coverage for these test metrics is calculated for
a line with a specific fault polarity.
With this notion of coverage, we show that additional
tests are not really necessary in METER. We have observed
that most test sets inherently achieve high coverage of most
metrics and fault models for a majority of signal lines. In
other words, it is likely that any given circuit region has very
high coverage for any reasonable metric or fault model under
consideration. For example, Fig. 2 shows the distribution of
coverage achieved by the production, stuck-at test set for each
signal line in a test chip (details of the chip are presented in
Section IV) for the bridge fault model, and the gate-exhaustive
and PAN-detect test metrics. We use N = 10 for the PANdetect metric which means 100% coverage is achieved for
some line stuck-at-v (v ∈ {0, 1}) if the fault is detected ten
times with ten different neighborhood states [14]–[16]. Fig. 2
shows that although the stuck-at test set does not directly target
any other models/metrics, 82.8%, 90.11%, and 53.5% of the
signal lines have 100% coverage of the bridge model, and
the gate-exhaustive and physically-aware ten-detect metrics,
respectively. This means that some arbitrary region that is
affected by a defect can likely be used to fully evaluate a fault
model or test metric. Even when the coverage is not 100%, it
1060
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
TABLE I
Comparison Between METER and Other Similar Work on Test
Evaluation
METER [26]
Extra tests
Defect region/location known
Correlate coverage changes
directly with defect detection
No
Yes
No
No
Yes
Yes
[30]
EMD Bridge Intra-Cell
Yes
Yes
No
No
Yes
Yes
No
No
No
Fig. 4.
Fig. 3.
Overview of the test-metric evaluation methodology METER.
is still possible that important information can be derived as
described in detail in Section IV.
Table I compares METER with other similar work on test
evaluation. Specifically, the work in [26] compares several
test metrics by correlating chip failures with metric coverage
achieved by the applied test patterns. These patterns do not
necessarily directly target the considered test metrics, which
means that additional test patterns for different test metrics are
not needed. Nevertheless, in [26], metric coverage is calculated
over the whole design. This means, at a minimum, fault
simulation has to be performed for the entire circuit. Also,
depending on the evaluated metrics, additional logical and
physical information for every signal line in the design may
be needed. In METER, defect detection is better correlated
with metric effectiveness since coverage is limited to the
potential defect regions within the failing chips. Thus, any
need for logical/physical information for coverage calculation
is therefore correspondingly reduced.
On the contrary, the work in [30] used diagnosis to investigate test quality. They however use additional test patterns
to measure the effectiveness of test metrics and fault models
that include, e.g., N-detect [or more specifically, embedded
multi-detect (EMD)] and bridge faults. Reference [30] also
utilized diagnosis to identify failing chips with intra-cell
defects, and examine these chips using the already-applied
test patterns. They however particularly focus on reducing
the mismatch between defect and fault behavior (i.e., the
metric/model behavior outside of the defect behavior) but not
on the effectiveness of the metric/model.
Approaches for identifying suspect regions for test-metric evaluation.
of test metrics and fault models.1 As shown in Fig. 3, METER
consists of four stages:
1) suspect region identification;
2) test selection;
3) failing chip selection;
4) test evaluation.
Specifically, tester-response data that results from the application of any type of test set is collected and analyzed. The test
data may simply include chip pass/fail information, or may
be more comprehensive in the form of full-fail response data.
The collected test data is then analyzed to identify suspect
regions within a failing chip, which are the logical lines that
are believed to cause chip failures. Next, depending on the
objective of the experiment, a subset of test patterns and
some failing chips of interest are selected for further analysis.
Finally, in the test evaluation stage, test metrics are evaluated
for the identified suspects from the failing chips using the
selected test patterns. This is achieved by correlating changes
in metric coverage with defect detection. Details of each stage
are described in the following sections.
A. Suspect Region Identification
III. Test-Metric Evaluation
The first stage of METER is to identify suspect regions
that are believed to cause chip failures. Several approaches
of varying cost and accuracy can be used, as illustrated in
Fig. 4. With the least amount of test data, i.e., only the
pass-fail outcomes of test patterns are recorded, the suspect
regions include all those that are sensitized by the failing
patterns (i.e., test patterns that fail the chip), as shown at
the top level of the reverse triangle in Fig. 4. If additional
information is collected and more comprehensive techniques
are used, higher accuracy is expected but at a higher cost.
For example, if test-pattern failure responses are recorded,
backcone tracing or path tracing [32] from failing outputs
and scan elements can be applied to identify possible defect
regions (the second and third level of the reverse triangle).
Fault simulation can also be performed to identify the fault
sites whose responses are compatible (e.g., match, subsume,
METER is a cost-effective and time-efficient approach for
comparing and evaluating the relative effectiveness/efficiency
1 From this point on, we will not make any distinction between test metric
and fault model.
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
and others) with tester responses (fourth level). Alternatively,
diagnosis can be used to identify suspects with higher accuracy
and resolution (fifth level). Diagnosis suspects are very likely
to include the actual defect regions, and are inexpensive to
obtain. If physical failure analysis (PFA) results are available,
test-metric evaluation can be performed on what is presumably
the actual defect region (bottom level).
Among the aforementioned approaches, test-metric evaluation using PFA results is of the highest accuracy but has
an associated high cost. Moreover, the number of failing
chips that have PFA results is typically small. Since more sophisticated region-identification techniques often imply more
assumptions and restrictions, few failing chips can have their
suspects successfully identified. The number of chips available
for evaluation is therefore likely to decrease as more advanced
region-identification approaches are employed as shown in
Fig. 4. Diagnosis, on the contrary, is less expansive since it
mostly involves circuit/fault simulation. Often a decent amount
of failing chips can be diagnosed and used for analysis. In
other words, one is more likely to draw statistically significant
conclusions by analyzing diagnosable failing chips. Circuit
tracing-based approaches require less computation time than
diagnosis, but the number of suspect regions that result is often
much higher. Among the possible techniques for identifying
suspect regions, diagnosis provides very good accuracy at a
reasonable cost, and it often results in a sufficient number of
samples that can be used for analysis. We believe diagnosis
is a good choice since it provides a proper tradeoff between
cost and accuracy.
B. Test Selection
Given a failing chip c and an identified suspect s of c, the
test patterns in the production test set T generated for the
chip design can be classified based on whether they: 1) were
applied to c; 2) sensitized suspect s; and 3) passed or failed
chip c, as illustrated in Fig. 5. METER allows great flexibility
in selecting the test patterns used for analysis, which can be
any subset of T . The only requirement is that for a chip c, at
least one test pattern that failed c needs to be included in the
selected test set Tselc so that we can correlate defect detection
with changes in metric coverage for some identified suspect.
For example, if the test flow stops after the first failing pattern
(FFP), then the subset of test patterns that start from the first
test pattern to the FFP can be used. If more test patterns are
applied, extra information collected from the application of
subsequent test patterns can also be utilized.
Depending on how the test patterns are selected, the subsets
used for different chips may not be the same. For instance,
if the subset of test patterns up to the FFP is used, then the
selected test set for chip 1 can be different from that for chip 2.
This is because the FFPs of chip 1 and chip 2 may be different.
On the contrary, if all the production test patterns are used,
then the test sets selected for different chips will be the same.
It should be noted that in some test flows, such as in adaptive
test or in a stop-on-first-fail environment, some test patterns
may not be applied. These test patterns (Tc ), while no pass/fail
information is available, can still be used for analysis. The use
of selected test patterns will be described in Section III-E.
1061
Fig. 5. Categories of test patterns given a failing chip c and a suspect region
s. The notation in the parentheses denote the set of test patterns in that
category.
C. Failing-Chip Selection
The objective of failing-chip selection is to identify chips
that are suitable for test evaluation and are of interest. The
chip selection process may vary depending on the goal of
the evaluation, evaluated test metrics, adopted suspect-region
identification techniques, and the characteristics of the applied
test patterns. For example, all the failing chips in the failure
logs can be used for analysis if a large sample size is desired.
In the cases where diagnosis is adopted for identifying suspect
regions, diagnosable failing chips are chosen. If test metrics
that target multiple faulty lines (e.g., open, Byzantine bridge,
multiple stuck-at, and others) are to be evaluated, diagnosis
methods such as [33] and [34] can be used to identify chips
that exhibit this behavior. If the considered test metrics target
defects that are not deterministically detected by stuck-at test,
a set of “hard-to-detect” failing chips that do not exhibit stuckat behavior can be selected.
D. Test Metrics for Evaluation
METER can be used to evaluate any test metric, or fault
model, or DFT approach, or their variants, whether they target
static or dynamic defect behaviors. For instance, the inputpattern fault model can be evaluated at the gate level or
higher levels of hierarchy (this will be demonstrated later
in Section IV-E). METER is applicable as long as the test
environment employed and the test approaches applied to the
failing chips adhere to the assumptions of the test metrics
under evaluation. For example, evaluating sequence-dependent
defects for PAN-detect test, although possible, is not reasonable since any detection of sequence-dependent defects is
fortuitous in nature. Similarly, evaluating the transition fault
model for a stuck-at-only test (i.e., no launch-on-shift or
launch-on-capture) would also be inappropriate.
E. Test Evaluation
To evaluate a test metric, we examine whether defect
detection is associated with changes in coverage of some test
metric for the identified suspect regions. This is achieved by
analyzing the selected subset of test patterns Tselc for each
failing chip c. Without loss of generality, we assume here
that all of the test patterns applied to c are selected, i.e.,
Tselc = Tc . (The cases where un-applied test patterns are used,
i.e., Tselc ∩Tc = φ, will be discussed later.) Tc is fault simulated
1062
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
A failing chip may have multiple suspects, each of which by
themselves could alone cause chip failure. For these situations,
the analysis needs to be performed for each suspect. For each
1
2
3
chip c, we sum Xm,
c, s , Xm, c, s , and Xm, c, s over all the suspects
of c as follows:
k
k
Xm,
Xm,
k ∈ {1, 2, 3}
(1)
c =
c, s
s
k
Xm,
c
Fig. 6. Correlation between defect detection and changes in metric coverage
for suspect s of failing chip c.
without fault dropping using stuck-at faults involving the
identified suspect regions only. For each suspect s of chip c,
we identify the set of test patterns in Tc that sensitize s (i.e.,
Tc, s in Fig. 5), and track the changes in metric coverage for
s resulting from the application of each test pattern in Tc, s .
Let Covm, c (s) denote the coverage of metric m for suspect s
of chip c. A test pattern t ∈ Tc, s can be classified into one
of the following categories depending on whether t increases
Covm, c (s) and failed chip c (see Fig. 6):
1)
2)
3)
4)
t increases Covm, c (s) and passed chip c (zone 1);
t increases Covm, c (s) and failed chip c (zone 2);
t does not increase Covm, c (s) but failed chip c (zone 3);
t does not increase Covm, c (s) and passed chip c (outside
the two circles).
Given a test metric, if a failing pattern tf ∈ Tc, s increases
the metric coverage for s, the test metric is considered effective
in detecting chip c. In other words, a test metric is deemed
effective if the metric coverage increases with the application
of the failing pattern tf (which falls into zone 2). If the
coverage does not change with tf (i.e., tf ∈ zone 3), then
it means the test metric was not at all needed to detect the
corresponding chip failure. The chip failed due to other reasons
outside the scope of the metric. Moreover, if the tests before
tf have already achieved “100% coverage” of the test metric,
then it means that the corresponding metric does not guarantee
the detection of the failure.
On the contrary, if a test pattern tp ∈ Tc, s increases coverage
of some metric for suspect s but did not fail chip c (i.e., tp ∈
zone 1), then it means that the metric covers zones outside the
defect behavior. Additionally, in the best case, this means more
test patterns are needed to further improve metric coverage
to eventually detect the defect. But obviously, this may not
be possible if the defect behavior lies outside the metric. Let
1
2
3
Xm,
c, s , Xm, c, s , and Xm, c, s denote the number of test patterns
in Tc, s that falls into zones 1, 2, and 3, respectively, for a
2
metric m. A good test metric should have a large Xm,
c, s and
1
3
a small Xm, c, s and Xm, c, s .
If Tselc includes more than one failing pattern, then each
failing pattern may indict different suspects. For example, a
suspect reported by single location at a time (SLAT) diagnosis [35] may only be associated with some failing patterns
whose tester responses match the suspect’s stuck-at fault
simulation responses. If this is the case, each suspect should
be examined using the corresponding failing patterns.
where
will be used to assess the effectiveness and
efficiency of a test metric.
1) Test effectiveness: to evaluate the effectiveness of a test
metric, we examine how well the metric subsumes defect
behavior using a measure called effectiveness ratio. For
a given metric m and chip c, the effectiveness ratio is
computed as follows:
2
2
3
ERm, c = Xm,
c /(Xm, c + Xm, c ).
(2)
The effectiveness ratio represents how often the coverage
of m is increased for some suspect when the chip
failed. A high effectiveness ratio means that increasing
metric coverage correlates with defect detection for this
2
3
particular chip. If Xm,
c + Xm, c = 1, i.e., chip c has
only one suspect and Tselc includes only one failing
pattern, then ERm, c simply depends on whether the
failing pattern increases metric coverage. If the coverage
2
is increased, then Xm,
c = 1 and ERm, c = 1. Otherwise,
2
3
ERm, c = 0. In other words, if Xm,
c + Xm, c = 1, ERm, c
becomes a binary indicator of whether the test metric is
effective.
2) Test efficiency: another focus in test evaluation is the
efficiency of a test metric. Early detection of failing
chips is desired because it saves test application cost
especially in a stop-on-first-fail environment. Test efficiency has been defined as the ratio of the number of
patterns targeting a specific test metric to the number of
chip failures detected by those patterns. The metric with
a smaller number of “patterns per failure” is considered
more efficient [28].
Instead, we define the efficiency of a test metric to be
the ratio of zone 2 to the left circle (see Fig. 6). In other
words, we compute the efficiency ratio for test metric m
and failing chip c as follows:
2
1
2
FRm, c = Xm,
c /(Xm, c + Xm, c ).
(3)
A high FRm, c means that zone 1 is smaller compared
to the left circle and that increasing metric coverage
correlates well with defect detection. The test metric
is therefore more efficient in capturing chip failure. If
metric m is not at all effective for a particular chip
2
c, resulting in Xm,
c = 0, then FRm, c becomes zero
by definition. This measure of efficiency is particularly
useful in a stop-on-first fail environment but is also
applicable to cases where information on subsequent
failing patterns are collected as well. It should be noted
that a metric can be very effective in defect detection but
have a poor efficiency. Moreover, a metric that precisely
captures a small portion of some defect behavior may
be ineffective but efficient.
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
3) Fault-detection recording: suppose for some metric m,
a failing pattern tf, 1 ∈ Tselc detects some fault involving
suspect s of chip c. If a subsequent failing pattern
tf, 2 ∈ Tselc also detects the same fault, then tf, 2 does not
increase metric coverage. In other words, tf, 2 is placed
3
3
into zone 3. Xm,
c, s (as well as Xm, c ) is increased by 1,
2
2
while Xm, c, s (and Xm, c ) remains the same, which in turn
degrades ERm, c . Nevertheless, it is possible that the fault
captures the behavior of the defect causing chip failure,
and every test pattern that detects this particular fault
fails the chip. The metric m is effective in detecting chip
c, but is not accounted for using the current formulation
of ERm, c .
To prevent underestimating ERm, c , a different faultdetection recording scheme can be used. In the new
scheme, a fault is recorded as detected only if it is
detected by a passing pattern. (This does not affect
whether a test pattern detects the fault; only the detection
status of the fault is changed.) In other words, if a fault
is detected only by failing patterns, each failing pattern
detecting this fault increases the metric coverage (while
the fault is still recorded as undetected). These failing
patterns are therefore classified into zone 2 instead of
2
2
zone 3, and Xm,
c, s as well as Xm, c are increased.
With this new fault-detection recording scheme, ERm, c
will not be underestimated. Nevertheless, we lose the
opportunity to examine whether other faults also detect
chip failures due to the existence of faults detected
only by failing patterns. The original fault-detection
scheme does not have this issue however. Analysis can
be performed using one or both schemes depending on
the evaluation objective.
4) Using un-applied test patterns: in Section III-B, we
mentioned that in some test flows, some production test
patterns may not be applied to a failing chip c (the
subset Tc in Fig. 5). These patterns, if applied, may
further increase the coverage of some test metrics for
some suspects, and may have the capability of detecting
chip failures. METER provides a way to consider the
effect that Tc could possibly have had. Without loss of
generality, assume that all test patterns in Tc are selected
for analysis, i.e., Tc ⊂ Tselc . Specifically, for a suspect
s of failing chip c, test patterns in Tc that sensitize s
3
1
2
(Tc , s ) are used. Let Xm,
c , s , Xm, c , s , and Xm, c , s be the
number of test patterns in Tc , s that could fall into zones
1, 2, and 3, respectively, for a metric m. Again, for each
chip c, we calculate as follows:
k
k
Xm,
Xm,
k ∈ {1, 2, 3}.
(4)
c =
c , s
s
The definition of effectiveness ratio and efficiency ratio
can then be rewritten as follows:
ERm, c =
FRm, c =
2
2
Xm,
c + Xm, c
3
2
2
3
(Xm,
c + Xm, c ) + (Xm, c + Xm, c )
2
2
Xm,
c + Xm, c
.
1
2
1
2
(Xm,
c + Xm, c ) + (Xm, c + Xm, c )
(5)
(6)
1063
Since Tc was not applied, it is unknown whether a test
pattern in Tc , s would pass or fail chip c. The actual
3
1
2
values of Xm,
c , s , Xm, c , s , and Xm, c , s are therefore unknown. However, from fault simulation, we know what
test patterns in Tc , s increase coverage for some suspect
s of chip c (denote the set as Tc+ , s ) and what test patterns
do not (denote the set as Tc− , s ). The number of test
patterns in Tc+ , s and Tc− , s can be used to calculate the best
and worst effectiveness/efficiency ratio that a test metric
could achieve with Tc . Specifically, in the worst case,
chip c passes all the test patterns in Tc+ , s , and fails all
1
+
the test patterns in Tc− , s . In other words, Xm,
c , s = |Tc , s |,
−
3
k
2
Xm, c , s = 0, and Xm, c , s = |Tc , s |. Calculating Xm,
c
and substituting into (5) and (6) provides the worst case
effectiveness ratio and efficiency ratio, respectively. In
the best case, the chip fails all the test patterns in Tc+ , s ,
and passes all the test patterns in Tc− , s . As a result,
3
1
2
+
Xm,c
, s = 0, Xm, c , s = |Tc , s |, and Xm, c , s = 0. The best
case effectiveness ratio and efficiency ratio can then be
derived accordingly.
Actually applying Tc may indict additional suspects that
were not identified previously. When this occurs, further
analysis concerning defect regions or defect detections
may be needed.
F. Metric/Model Case Studies
In the following, we illustrate the detailed procedures
employed for evaluating the effectiveness and efficiency of
the bridge fault model [5], [6], the gate-exhaustive metric [9], [10] (also known as the gate-level input-pattern fault
model [8]), and the physically-aware N-detect test metric [14]–
[16]. METER is not limited however to these test metrics and
can be just as easily applied to various DFT approaches as
well.
1) Bridge fault models: to evaluate various bridge fault
models, we first extract the possible bridge regions for
each identified suspect of a chip c. Specifically, the
physical neighbors that are within a distance d for each
suspect are obtained from the design’s layout.2 A suspect
s and each of its physical neighbors are a possible
bridge defect, and all the bridge defects involving s are
evaluated. Associated with each bridge consisting of a
suspect s and its physical neighbor p are two 2-line
bridge faults: s stuck-at zero when p = 0 and s stuck-at
one when p = 1. Traditional bridge fault models (e.g.,
AND-type, OR-type, dominate, and the four-way bridge
fault models) are all implicitly considered, including
both non-feedback and feedback bridges. M-line bridge
faults can be handled as well but in this analysis we only
consider bridge faults with M= 2.
For each test pattern t ∈ Tselc , we examine the bridge
faults that are detected by t and by Tprev , where Tprev is
the set of test patterns in Tselc that are applied before t.
Whenever a physical neighbor p is driven to the opposite
value of s and a stuck-at fault affecting s is detected, a
2 Physical neighbors can be identified using DRC/LVS [28] or criticalarea [36] approaches or by utilizing parasitic extraction data.
1064
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
bridge fault involving p and s is deemed detected. We
specifically examine whether a particular bridge fault
is detected by t but not by Tprev . If all of the bridge
faults detected by t are detected by Tprev , or if t does
not detect any bridge fault, then t does not increase the
bridge fault coverage for the suspect s of chip c. Test t
will be classified depending on whether t failed chip c
and whether t increases bridge fault coverage for s, as
1
2
described in Section III-E. Quantities XB,
c, s , XB, c, s , and
3
XB,
(where
B
stands
for
bridge
fault)
are
calculated
c, s
based on the classification result, and are used to assess
the effectiveness and efficiency of the bridge fault model.
2) Gate-exhaustive metric: we also evaluate the effectiveness/efficiency of the gate-exhaustive metric [8]–[10].
With the assumption that only a single gate is faulty,
gate-exhaustive testing requires each gate output to be
sensitized for all possible input combinations.
The procedure for evaluating the gate-exhaustive metric
is similar to the one used for the bridge fault models. For
each suspect s of a failing chip c, we identify the inputs
of the gate that drives the suspect, i.e., the driver-gate
inputs. The set of logic values applied to the driver-gate
inputs of s by a test pattern t that sensitizes s is defined
as a driver state of s. We next track the driver state
established by each test pattern t ∈ Tselc , and examine
whether t establishes a new driver state and sensitizes
s, thereby increases the gate-exhaustive coverage. In the
case where s is a branch, the coverage of the downstream
gate driven by s is calculated. Based on the zone that
3
1
2
t falls into (see Fig. 6), XG,
c, s , XG, c, s , and XG, c, s
(G stands for gate-exhaustive) are calculated.
3) Physically-aware N-detect metric: the physically-aware
N-detect (PAN-detect) metric exploits physical information to generate test patterns capable of improving defect
detection for modern designs [14]–[16]. The metric
defines the neighborhood of a suspect as the set of signal
lines surrounding the suspect. Three types of signal lines
are considered in the neighborhood of a suspect [16]
that include: 1) signal lines that are within a distance
d of the suspect in the layout (physical neighbors);
2) inputs of the gate that drives the suspect (driver-gate
inputs); and 3) side inputs of the gates that receive the
suspect (receiver-side inputs). The set of logic values
established by a test pattern t on the neighborhood
lines of a suspect s when s is sensitized is called the
neighborhood state. PAN-detect test requires a targeted
signal line be sensitized with at least N neighborhood
states.
To evaluate the effectiveness of PAN-detect, we extract
the neighborhood for each suspect. We next track the
neighborhood states established by test pattern t ∈ Tselc
and by Tprev for each suspect s. If t establishes a new
neighborhood state that has not yet been established by
Tprev , then t increases the PAN-detect coverage for the
suspect. Test t is then classified into the appropriate
zone based on the rules described in Section III-E, and
1
2
3
XP,
c, s , XP, c, s , and XP, c, s (P stands for PAN-detect) are
calculated.
IV. Experiments
We apply METER to evaluate the bridge, gate-exhaustive,
and PAN-detect metrics. Failure logs from LSI test chips
fabricated in a 110 nm process are utilized. The test chip
design consists of 384 64-bit arithmetic-logic units (ALUs),
where each ALU has ∼3000 gates. The stuck-at test of
an ALU consists of approximately 260 scan-test patterns,
achieving >99% stuck-at fault coverage. In this experiment,
signal lines within 0.5 µm of the targeted line are deemed as
physical neighbors, which are used for evaluating both bridge
and PAN-detect. We have data for over 2500 failing chips.
For an assumed yield of 95%, this means our analysis here
is equivalent to a chip test experiment involving more than
50 000 chips.
In the following, we describe the procedures used to select
failing chips and identify suspects for subsequent analysis
(Section IV-A), and present the results in great detail (Sections IV-B–IV-E). While the test patterns up to and including
the FFP are used in Sections IV-A–IV-E, we demonstrate in
Section IV-F how test metrics can be evaluated using all the
applied test patterns and different suspect-region identification
techniques.
A. Diagnosable and Hard-to-Detect Chip Selection
In this experiment, we use diagnosis to identify suspect regions that cause chip failure. The three test metrics evaluated,
namely, bridge, gate-exhaustive, and PAN-detect, target defects
not deterministically detected by stuck-at test patterns. A set
of diagnosable and hard-to-detect failing chips are therefore
selected for analysis. Here, diagnosable means that a suspect
region that leads to the FFP can be pinpointed by diagnosis,
while “hard to detect” means that the failing chip would not
be necessarily detected by tests aimed only at stuck-at faults.
Hard-to-detect chips are the target of bridge, N-detect, and
PAN-detect test, and therefore are the subject of our analysis.
Of the 2533 chips in the LSI failure logs, 720 chips are
diagnosable and 87 of 720 are hard-to-detect.3 The 87 chips
are partitioned into two categories: 28 chips having only one
suspect and 59 having two or more suspects, each of which
alone could cause the chip’s FFP. The 28 failing chips are
of particular interest since we have significant confidence in
the failure region identified by diagnosis. Test metrics can
be easily evaluated for the single suspect of each chip. For
the remaining 59 chips, resolution for the FFP is degraded,
meaning that there is more than one single-region candidate
that could cause the FFP. For these cases, test-metric evaluation is performed and analyzed over all the suspects of a chip
(Section IV-C).
B. Single-Suspect Failing Chips
Table II shows the results of applying METER to the 28
single-suspect LSI failing chips. Column one gives the chip
index. Columns two to six show the total number of physical
neighbors of the suspect (Nbrs), the number of test patterns
3 All the 2533 chips, including those that are disregarded here, are analyzed
later when all the failing chips are examined using all the applied test patterns
and less-restricted suspect-region identification techniques.
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
1065
TABLE II
Test-Metric Evaluation Results for Single-Suspect Failing Chips
Chip
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Nbrs
14
18
18
5
10
32
12
12
8
24
7
14
14
11
5
13
8
15
11
8
18
13
6
14
16
11
32
32
Nd,B
3
6
8
19
3
2
3
12
16
3
4
4
9
9
6
5
5
19
2
10
42
2
17
5
3
4
2
2
Bridge
Bprev
14
28
31
10
12
13
7
22
14
14
10
12
25
18
5
10
9
29
7
15
36
6
11
18
19
15
13
17
1
XB,c
2
5
4
6
2
1
1
6
4
2
3
3
5
4
4
3
4
9
1
8
12
1
5
3
2
3
1
1
BFFP
0
1
1
0
3
10
1
0
0
2
1
0
1
1
2
1
2
0
2
0
0
5
0
4
3
1
10
6
Gate inputs
2
1
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
1
2
1
1
2
2
2
(including the FFP) that sensitize the identified suspect (i.e.,
Nd, B ), the number of unique bridge faults detected by the
patterns before the FFP (Bprev ), the number of test patterns
1
before the FFP that detect new bridge faults (XB,
c ), and the
number of new, unique bridge faults detected by the FFP
(BFFP ), respectively. Similarly, for gate exhaustive, columns
seven to ten give the total number of gate inputs driving the
suspect (Gate inputs), the number of test patterns that sensitize
the identified suspect (Nd, G ), and the number of unique driver
states (of the suspect) that are established before the FFP
(Gprev ) and by the FFP (GFFP ). The last four columns show
the numbers for PAN-detect, including the number of signal
lines in the neighborhood of the suspect (Nbrhd), the number
of test patterns that sensitize the identified suspect (Nd, P ), and
the number of unique neighborhood states established before
the FFP (Pprev ) and by the FFP (PFFP ).
For gate-exhaustive and PAN-detect, the number of test
patterns that passed the chip and increase metric coverage
1
1
for the suspect (i.e., XG,
c and XP, c ) is equal to the number
of states established by the patterns before the FFP. In other
1
1
1
1
words, XG,
c = Gprev and XP, c = Pprev . XG, c and XP, c are
therefore not listed explicitly.
Bridge faults involving a suspect include the cases where
the suspect fails with a faulty-0 or faulty-1. Similarly, for
gate exhaustive, considering all possible driver states implicitly
takes into account both stuck-at faults. The analyzed test
patterns therefore include those that sensitize the suspect to
Gate-Exhaustive
Nd,G
Gprev
3
1
6
2
8
4
19
3
3
2
2
1
7
3
12
2
16
4
3
1
4
2
5
2
9
2
9
2
6
2
6
2
5
3
19
3
2
1
10
4
42
6
2
1
17
3
5
2
3
2
4
2
2
1
2
1
GFFP
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
Nbrhd
15
19
21
7
12
32
13
14
8
25
9
15
15
12
7
14
9
16
13
10
21
14
8
15
16
12
32
32
PAN-Detect
Nd,P
Pprev
3
2
4
3
7
6
10
8
2
1
2
1
3
1
9
7
10
8
3
2
2
1
4
3
5
3
3
2
4
3
5
4
2
1
9
8
2
1
6
5
30
29
2
1
7
5
2
1
2
1
2
1
2
1
2
1
PFFP
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
either logic zero or logic one up to and including the FFP.
In cases where the suspect is a branch, the downstream gate
g driven by s is analyzed, and the considered test patterns
include those that sensitize the output of g. Since there can
be more test patterns that sensitize g compared to s, Nd, G
can be larger than Nd, B . For PAN-detect, on the contrary, a
neighborhood state is associated with a specific stuck-at fault
involving the suspect. Only the test patterns sensitizing the
suspect with the required stuck-at fault polarity is considered.
Therefore, Nd, G ≥ Nd, B ≥ Nd, P .
Analysis of Table II reveals that nine of the 28 singlesuspect chips have BFFP = 0. This means that for these nine
chips, tests aimed at bridge faults do not guarantee failure
detection. Specifically, for chips 4 and 21, all the bridge faults
involving the identified suspects are detected by the patterns
before the FFP (i.e., 2 × Nbrs = Bprev ). In other words, the
bridge coverage for the suspects of these two chips is 100%,
which indicates that the use of typical bridge models does
not guarantee detection of these failures. Moreover, chips 8,
9, 18, 20, and 23 have a bridge coverage of over 80%. For
these cases, it is possible that additional bridge coverage could
have detected the failure but obviously was not necessary since
BFFP = 0 for each of these failing chips.
For the gate-exhaustive metric, only three of the 28 chips
have increased coverage due to the FFP (i.e., GFFP = 1).
The gate-exhaustive coverage appears to be low for many of
these chips, which is surprising given Fig. 2. There could be a
1066
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
Fig. 7. Venn diagram showing the number of single-suspect failing chips
whose FFP increases coverage for the bridge, gate-exhaustive, and PAN-detect
test metrics.
number of reasons why coverage is low however that include,
e.g., only tests before the FFP are examined, and that some
gate-input patterns may not be possible due to circuit structure.
In any event, it is not the case that a particular driver state alone
is needed to detect a large majority of these single-suspect
failures. For the PAN-detect metric, the FFPs of all but two
failing chips (chips 12 and 20) establish new neighborhood
states for the identified suspects.
Because single-suspect failing chips are considered here and
2
3
Tselc includes only one failing pattern (the FFP), Xm,
c +Xm, c =1,
where m ∈ {B, G, P}. For chips whose FFP increases some
metric coverage (i.e., chips with BFFP > 0, GFFP > 0, or
2
3
PFFP > 0), Xm,
c = 1 and Xm, c = 0. The effectiveness ratio for
these chips for the corresponding metric is therefore equal to
2
3
one. Otherwise, Xm,
c = 0 and Xm, c = 1, and the effectiveness
ratio becomes zero. For example, the effectiveness ratio of
chip 5 in Table II for bridge, gate-exhaustive, and PAN-detect
is one, zero, and one, respectively. The efficiency ratio of a
1
chip, on the contrary, can be calculated using (3) with XB,
c,
1
1
XG, c = Gprev , and XP, c = Pprev for bridge, gate-exhaustive,
and PAN-detect. For instance, the efficiency ratio of Chip 5
for bridge, gate-exhaustive, and PAN-detect is 1/(2+1)=0.33,
0/(2+0)=0, and 1/(1+1)=0.5, respectively.
The Venn diagram in Fig. 7 summarizes the outcome of
effectiveness evaluation for single-suspect failing chips. Each
integer in the diagram is the number of chips whose FFP
increases coverage of the evaluated metrics for the suspect
regions. Fig. 7 shows that one chip is captured by all three
metrics, while five chips are uniquely caught by PAN-detect.
Note that the FFP of chips 12 and 20 do not increase the
coverage for any of the three metrics, implying that the chips
failed in a way that cannot be captured by any of the three
metrics, at least for the metric parameters used. These chips
are further discussed in Section IV-E.
Fig. 8. Distribution of the effectiveness ratio ER for the multiple-suspect
failing chips.
Fig. 9.
chips.
Distribution of the efficiency ratio FR for the multiple-suspect failing
(i.e., ER = 100%). The bridge fault coverage is not increased
for any of the suspects for nine of the 59 failing chips,
implying that the bridge effectiveness and efficiency ratios for
these chips are zero.
For gate exhaustive, no chip has ER = 100%, but 48 of the
59 chips have ER = FR = 0. For PAN-detect, the FFP of 39
failing chips each establishes a new neighborhood state for all
suspects (ER = 100%). The FFP of the remaining 20 chips
each establishes a new neighborhood state for at least one
but not all of the suspects. Since the suspects of these chips
have failed with Nd > 1, each suspect is sensitized by at least
1
one passing pattern, i.e., Xm,
c ≥ 1. As a result, the efficiency
ratio can be at most 50%. Note that the trend of test-metric
effectiveness observed from the multiple-suspect failing chips
is inline with what we observed from the single-suspect failing
chips presented in the previous section.
D. Average Efficiency and Effectiveness
C. Multiple-Suspect Failing Chips
We apply METER to the 59 failing chips with multiple
suspects. Specifically, for each suspect of a multiple-suspect
failing chip, we apply the same analysis described in Section IV-B. Results for all the suspects are then collected, and
the effectiveness ratio and efficiency ratio are calculated.
Figs. 8 and 9 show the distribution of the effectiveness
and efficiency ratio for the 59 multiple-suspect failing chips,
respectively. For the bridge fault model, the FFP of 11 failing
chips each detects some new bridge fault for all the suspects
Using the data from the 28 single-suspect chips in Table II,
an average effectiveness and efficiency ratio of bridge, gate
exhaustive, and PAN-detect for the 28 single-suspect and hardto-detect failing chips can be calculated and compared. Here,
we adopt a visual approach to compare the effectiveness and
efficiency for these metrics. For each metric m and each failing
1
2
3
chip c, we calculate the ratio of Xm,
c , Xm, c , and Xm, c to their
sum as follows:
k
k
1
2
3
Fm,
c = Xm, c /(Xm, c + Xm, c + Xm, c )
k ∈ {1, 2, 3}.
(7)
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
Fig. 10. Likelihood that a test pattern failed a chip and/or increases coverage
for (a) bridge, (b) gate-exhaustive, (c) PAN-detect, and (d) N-detect for the
28 LSI failing chips.
The ratios for a zone are averaged over all of the 28 failing
chips as follows:
k
Akm =
Fm,
k ∈ {1, 2, 3}
(8)
c /|C|
c
where |C| = 28 is the number of chips considered. Note that
A1m + A2m + A3m = 1. The averaged effectiveness ratio and
efficiency ratio are calculated as follows:
ERm = A2m /(A2m + A3m )
(9)
FRm = A2m /(A1m + A2m ).
(10)
For each evaluated test metric, we re-plot Fig. 6 and make
the area of zones 1, 2, and 3 proportional to A1m , A2m , and
A3m , as shown in Fig. 10(a)–(c).4 It can be observed that PANdetect has the largest zone 2 (0.31) among the three evaluated
metrics, and also has the largest average effectiveness and
efficiency ratios. The gate-exhaustive metric, on the contrary,
has the smallest zone 2 (0.03), which leads to the lowest
effectiveness and efficiency ratios among the three metrics.
E. Test Metric Generalization
Results of METER described in Sections IV-B and IVC show that the FFP of a chip may not improve the
coverage of a metric. These chips failed due to defects
that have activation conditions that are outside these metrics.
Of the three evaluated metrics, PAN-detect increases coverage
4 Using the data in Table II, we can also evaluate traditional N-detect, and
the result is shown in Fig. 10(d). Specifically, Nd, P is the number of times a
suspect is sensitized with the required stuck-at fault polarity, i.e., the stuckat fault involving the suspect region of a failing chip has been N-detected
with N = Nd, P test patterns when it fails. When a chip fails, the coverage
for N-detect for a suspect region is always increased since Nd, P increases
(unless a hard threshold of N is used). In other words, N-detect can never be
ineffective. This is reflected in the perfect ER and the high FR for N-detect,
and is also shown in Fig. 10(d) where zone 3 is empty and the left oval
completely subsumes the right one. While the proposed measures reveal the
characteristics of N-detect, using these measures to judge the effectiveness of
N-detect is inappropriate. This holds for any test metrics that can never be
ineffective. Later in Section, we demonstrate how to better compare N-detect
and PAN-detect by utilizing METER in a different manner.
1067
for the FFP for most failing chips. Nevertheless, PAN-detect
does not capture two single-suspect failing chips and may fail
to capture 20 multiple-suspect failing chips in the worst case
(as shown in Figs. 7 and 8, respectively). In our analysis
thus far, we included physical neighbors, driver-gate inputs,
and receiver-side inputs in the neighborhood for each suspect
signal line. In the diagnosis procedures described in [37],
[38], other types of neighbors are used as well, including
the driver-gate inputs of physical neighbors since it is known
they affect drive strengths [6]. If the driver-gate inputs of
physical neighbors are included in the neighborhoods instead
of the physical neighbors, the FFPs of all the hard-to-detect
chips (with either single or multiple suspects) establish new
neighborhood states for all the suspects of these chips. Because
the neighborhood encompasses all the localized influences on a
suspect line, it is not surprising that PAN-detect performs well.
However, there is danger that having a neighborhood
too large creates a situation where the metric becomes too
general. Exploring the tradeoff between including additional
types of signal lines in the neighborhood and increasing the
distance d used for physical neighbor extraction, and the
mismatch between defect behaviors is needed to efficiently
generate effective test sets. METER can be easily used to
meet this objective by analyzing and guiding the selection of
parameters used in ATPG.
Bridge fault models focus on unintended connections among
wires. Use of bridge fault models is typically limited to defects
involving only two lines, that create no structural feedback,
and ignore cell-drive strengths. But they can be generalized
in several ways, e.g., by including more than two lines, and
more complex contention functions. The gate-exhaustive test
focuses on problems at the transistor level. It too, however,
can be generalized to higher levels of the hierarchy or to
include groups of cells or gates [8], [39]. Both bridge and
gate-exhaustive metrics are subsumed however by PAN-detect
with a neighborhood that includes physical neighbors, drivergate inputs, and receiver-side inputs.
F. Utilizing All Test Patterns
We further apply METER using all of the applied test
patterns for each of the 2533 available failing chip logs. In
addition, we demonstrate different methods for identifying
potential suspects. In the first method, any region that is
sensitized by a failing pattern is deemed a suspect region. In
the second method, the stuck-at fault response of a sensitized
region (for at least one failing pattern) must exactly match the
failing-pattern tester response to be deemed a suspect. For the
bridge fault model, and the gate-exhaustive and PAN-detect
metrics, we plot the effectiveness ratio ER for all 2533 chips
against the total number of unique suspects identified across
all failing patterns. Specifically, Fig. 11(a) shows the result
for selecting suspects using only pass-fail test data,5 while
Fig. 11(b) shows the result for SLAT regions. Each failing chip
has three points plotted, one that indicates the effectiveness
5 The analyzed ALU has 5110 signal lines, i.e., a failing chip has at most
5110 unique sensitized regions. The maximum occurs when a failing chip has
many failing patterns where the union of the sensitized regions is the set of
all the signal lines.
1068
Fig. 11.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
Effectiveness ratio for the 2533 chips. (a) Calculated for all regions sensitized by the failing patterns. (b) Calculated for the SLAT regions.
ratio for PAN-detect (triangles), one for bridge (circles), and
one for gate-exhaustive (squares).
Fig. 11(a) and (b) reveals that there is significant range
in the ER values which is expected since many of the
suspects, of course, have nothing to do with the defect. Also as
expected, the scatter along the ER axis does reduce however as
the suspect-identification procedure improves as demonstrated
when moving from the sensitized regions [Fig. 11(a)] to
the SLAT regions [Fig. 11(b)]. Finally, it is clear that the
effectiveness of the metrics follow the same trend observed
as the hard-to-detect chips, i.e., PAN-detect is most effective
followed by bridge and then gate exhaustive.
V. Discussion
In this section, we demonstrate the applicability of METER
to large designs, and discuss applications of the methodology.
We also compare and contrast METER against the traditional
tester-based approach for evaluating test metrics.
A. Applicability to Large Designs
METER can be easily applied to large, industrial designs
since it simply analyzes failure logs from already applied tests,
and only requires fault simulation of stuck-at faults involving
suspect regions. In other words, only a small portion of the
circuit has to be analyzed and existing fault simulation tools
can be utilized. METER is therefore scalable to large designs.
To demonstrate applicability, we apply METER to an
NVIDIA GPU. The GPU has ∼10M gates, and is fabricated
using 90 nm technology. The bridge fault model, the gateexhaustive and PAN-detect metrics are evaluated. Diagnosis is
performed using Synopsis TetraMAX [40] to identify suspect
regions within failing chips. For each failing chip, the test
patterns up to and including the chip’s FFP are used for
analysis.
Of the 4000+ failing chips in the stuck-at failure logs, we
focused on the 33 chips that have a single suspect reported by
diagnosis and are hard to detect (Nd > 1). We perform fault
simulation on these suspects over the selected test patterns
using TetraMAX, and examine whether a chip’s FFP increases
the coverage of any metric for the suspect. The outcome is
summarized in the Venn diagram shown in Fig. 12. Each
integer in the diagram represents the number of chips whose
FFP increases the coverage of the evaluated metric(s) for the
Fig. 12. Venn diagram showing the number of single-suspect failing chips
whose FFP increases coverage for bridge, gate-exhaustive, and PAN-detect for
NVIDIA GPUs.
suspect regions. The evaluation results are consistent with the
previous experiments that use the LSI test chips. Specifically,
PAN-detect uniquely captures five chip failures, while bridge
test is found to be much more effective than gate-exhaustive.
There are two chips whose FFP does not increase coverage
of any of the three metrics. It is likely that PAN-detect
test can capture these two chips if the driver-gate inputs of
physical neighbors are included in the neighborhoods instead
of the physical neighbors, similar to the case discussed in
Section IV-E.
Similar to the experiment using the LSI test chips, we
calculate the average ratios Akm , ERm , and FRm , using (8)–
(10), respectively, for the three evaluated metrics (as well as
N-detect) for the 33 selected NVIDIA GPUs (see Fig. 13). It
can be observed that for these GPUs, PAN-detect test is most
effective and efficient in defect detection, followed by bridge
and then gate-exhaustive.
Our measure of effectiveness and efficiency provides a
manner to evaluate and compare test metrics over different
manufacturing technologies and products. For example, by
contrasting Figs. 10 and 13, it can be observed that gateexhaustive test becomes more effective for the NVIDIA 90 nm
GPUs than for the LSI 110 nm ALU chips. On the contrary, the
ratio for the bridge fault model remains virtually the same. The
PAN-detect metric becomes even more effective and efficient
for the NVIDIA GPUs. To be conclusive, however, much more
data from failing chips should be analyzed.
B. Applications
METER has been demonstrated by comparing the effectiveness and efficiency of several metrics. Measures of test-
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
Fig. 13. Illustration of chances that a test pattern failed a chip and/or
increases coverage for (a) bridge, (b) gate-exhaustive, (c) PAN-detect, and
(d) N-detect for the 33 NVIDIA GPUs.
metric effectiveness and insufficiencies learned from tester
data provide guidelines for developing new fault models,
test metrics, and DFT methods. The information can also be
used to select a proper mix of tests to guarantee a certain
level of quality as described in [38] and [41]. Specifically,
the work in [38] and [41] derives a defect type distribution.
METER can be used in conjunction with these other papers
to determine which metrics and models are best at detecting
the derived defect types, thus enabling custom test, i.e., a test
that matches the defect-type distribution for a given design.
Other applications of METER include guiding the selection of
parameters used in ATPG, such as selecting the distance for
bridge extraction and neighbor identification for PAN-detect,
and others.
In the following, we demonstrate how METER can be
applied to select a proper value of N for both N-detect and
PAN-detect. As N increases, it is expected that the defect
coverage of an N/PAN-detect test set would increase [3], [11],
[14]. The improved test quality however comes at the cost of
a higher pattern count and test application cost. Selecting an
appropriate value of N therefore requires a tradeoff between
cost and quality. Common practice is to choose N based on
available test resources (i.e., tester memory, test time, and
others). Here, we use METER to demonstrate how the test
quality can be examined as a function of N.
For choosing N, we use failure logs of another large design,
an IBM ASIC. The IBM chip has nearly a million gates,
fabricated using 130 nm technology. Physical neighborhood
information includes all the signal lines within 0.6 µm of the
targeted line. The stuck-at test set applied during wafer test
consists of 3439 test patterns that achieve 99.51% stuck-at
fault coverage. Among the 606 chips in the stuck-at failure
logs, the 304 chips that failed scan chain flush test are
disregarded. The remaining 302 chips are diagnosed to identify
the suspects using Cadence Encounter Diagnostics [42]. Each
suspect is fault simulated using all test patterns up to and
including the chip’s FFP. For the stuck-at fault involving the
suspect, we record the number of times the fault is detected
(i.e., the number of N detections, Nd ) and the number of
neighborhood states established for the fault (i.e., the number
1069
of PAN detections, Ns ).
Because 284 of the 302 diagnosed failing chips have more
than one suspect, we take the following approach to handle
multiple-suspect chips. For each failing chip, we record Nd
and Ns of the suspect that is ranked highest in the diagnosis
report, as well as the maximum/minimum/average Nd and Ns
over all the suspects of the chip. The diagnosis tool employed
reports a score for each identified suspect, where the score
measures the similarity between suspect behavior and failingchip behavior. The best-ranked suspect is the one that has the
highest score, and is considered more likely to be the actual
location of the failure. Using the Nd and Ns for the bestranked suspect, as well as using the max/min/average Nd and
Ns , constitutes a variety of options for obtaining Nd and Ns
values for a multiple-suspect failing chip.
Fig. 14(a) and (b) shows the histograms of the number
of N and PAN detections, respectively. The bars indicate the
number of chips that are N/PAN detected, and the table in
each plot reports the tail data. For example, seven of the
302 failing chips have their best-ranked suspect N-detected
four times before the corresponding chip failed. For one
chip, the best-ranked suspect was sensitized 33 times with
a different neighborhood state before it finally failed for the
34th state. Using Fig. 14, the number of possible test escapes
when different values of Nd and Ns are chosen can be easily
determined. For instance, if the best-ranked suspect region
is the actual defect region, then applying Nd =10-detect test
to that region only would lead to three test escapes. On the
contrary, applying physically-aware Ns =10-detect test would
result in one test escape. Given enough chips to analyze and
a threshold on the defect parts per million, this analysis can
be used to select the value of N for ATPG.
C. Comparing Test-Metric Evaluation Methods
Table III compares METER with the traditional approach
involving application of extra test patterns generated specifically for the metrics under evaluation. Both approaches require
the analysis of the chip’s design information (e.g., netlist
and layout) for identifying fault characteristics that include,
e.g., physical neighbors and driver-gate inputs. Tester-based
evaluation, however, typically requires the generation and fault
simulation of extra test patterns in order to isolate the detection
characteristics of each metric. Furthermore, new, powerful
ATPG and fault simulation tools need to be developed, or
existing tools have to be tricked, to generate tests for new test
metrics since tests are needed for the entire design. METER,
on the contrary, fault simulates only a small subset, albeit
without fault dropping, of the existing test patterns against
suspect failing regions identified from failing chips. Given the
NP-complete nature of ATPG, limited fault simulation of just
a portion of the design without fault dropping is a significantly
less-intensive task.
METER can be easily applied to large designs, as demonstrated in Section V-A, since current tools for stuck-at fault
simulation can be utilized and the analysis only requires some
script writing. This is not scalable for traditional tester-based
evaluation approach where the entire design is considered.
More significantly, analysis of existing fail data is a much more
1070
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011
Fig. 14.
Histograms showing the number of (a) N detections and (b) PAN detections for the 302 IBM failing ASICs.
TABLE III
VI. Conclusion
Comparison of Test-Metric Evaluation Techniques
A general and cost-effective test-metric evaluation methodology, METER, was described and demonstrated. METER
provided a novel approach for analyzing the effectiveness and
efficiency of new and existing test and DFT methods, which
traditionally relied on empirical data from expensive and timeconsuming chip experiments. METER analyzed failure log
files from tests already applied, and did not require additional
tests. Although only test data from failing chips were analyzed,
it was equivalent to chip experiments with a large sample size.
The time and cost for test generation and test application for
test-metric evaluation were therefore completely avoided.
One problem of METER is that test metrics are evaluated
under the environment that was employed (e.g., temperature, voltage, test clock rate, and others) since existing testmeasurement data is utilized. As test environment changes,
the relative effectiveness of the evaluated test metrics may
change as well. Test-measurement data collected under a different environment is needed to evaluate metrics for different
conditions. Moreover, the test approaches applied to the failing
chips should adhere to the assumptions of the test metrics
under evaluation. Otherwise, the evaluation is not reasonable
although possible.
METER has been demonstrated by comparing the effectiveness and efficiency of several metrics that include bridge,
gate-exhaustive, and PAN-detect using the stuck-at failure logs
from actual fabricated and tested ICs. With this approach, test
metrics can be easily evaluated and compared over different
manufacturing technologies and products. The resulting information provides guidelines on how to select the best mix of
test methods. It can also be used to guide the development of
new test metrics, fault models, and DFT methods, as well as
the selection of parameters used in ATPG.
Tester-Based Evaluation
METER
Netlist analysis (−)
Netlist analysis (−)
Layout analysis (−)
Layout analysis (−)
ATPG/fault sim. (×)
Limited fault sim. w/o fault dropping (✓)
New ATPG/fault sim. tools (×) Existing fault sim. tools (✓)
Tester use (×)
Analysis of fail data (✓)
Controllable test environment (✓) Test environment not controllable (×)
Controllable coverage (✓)
Coverage not controllable (×)
Gross (×)
Fine-grained (✓)
*
✓: good; −: tie; ×: bad.
cost-effective activity as compared to the tester time needed to
apply extra patterns to tens or hundreds of thousands of chips.
However, since METER utilizes existing test patterns, the
test environment (e.g., temperature, voltage, test clock rate,
and others) cannot be changed. In other words, test metrics
are evaluated under what was employed. Moreover, as already
mentioned in Section I, the coverage of the test metric is not
controlled. On the contrary, as shown in Fig. 2, the coverage
achieved for any given metric for most of the design is
extremely high since it is typically the case that many regions
in a design are well tested by a thorough stuck-at test set. The
detection efficiency of bridge faults and the gate-exhaustive
metric is probably even higher since some untested bridges
and input-pattern faults are quite likely redundant.
Last but not least, tester-based evaluation is a gross measure
of effectiveness since it is unknown whether the unique fallout
is due to the model/metric/DFT method being evaluated or
simply fortuitous in nature. Instead, METER is fine-grained
in that it associates defect detection with changes in metric
coverage for the suspect regions believed to be the region of
the defect. Although some suspect identification techniques
such as diagnosis is not perfect, it is quite likely that the
reported suspects include the actual defect regions. If we
analyze all the suspects and observe statistically significant
trends in the data, meaningful conclusions can be drawn.
VII. Acknowledgment
The authors would like to thank Carnegie Mellon University,
Pittsburgh, PA, Ph.D. students O. Poku for his help on the LSI
experiment, C. Xue for his help on the IBM experiment, and
LIN AND BLANTON: METER: MEASURING TEST EFFECTIVENESS REGIONALLY
J. Nelson, W. C. Tam, and X. Yu for their help on the NVIDIA
experiment.
1071
[26]
References
[27]
[1] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for
Digital, Memory, and Mixed-Signal VLSI Circuits. Boston, MA: Kluwer,
2000.
[2] S. Sengupta, S. Kundu, S. Chakravarty, P. Parvathala, R. Galivanche,
G. Kosonocky, M. Rodgers, and T. M. Mak, “Defect-based test: A key
enabler for successful migration to structural test,” Intel Technol. J., Q.1,
pp. 1–12, 1999.
[3] S. C. Ma, P. Franco, and E. J. McCluskey, “An experimental chip to
evaluate test techniques experiment results,” in Proc. Int. Test Conf., Oct.
1995, pp. 663–672.
[4] E. J. McCluskey and C.-W. Tseng, “Stuck-fault tests vs. actual defects,”
in Proc. Int. Test Conf., Oct. 2000, pp. 336–342.
[5] K. C. Y. Mei, “Bridging and stuck-at faults,” IEEE Trans. Comput., vol.
C-23, no. 7, pp. 720–727, Jul. 1974.
[6] J. M. Acken and S. D. Millman, “Accurate modeling and simulation of
bridging faults,” in Proc. Custom Integr. Circuits Conf., May 1991, pp.
12–15.
[7] J. A. Waicukauski, E. Lindbloom, B. K. Rosen, and V. S. Iyengar,
“Transition fault simulation,” IEEE Des. Test Comput., vol. 4, no. 2,
pp. 32–38, Apr. 1987.
[8] R. D. Blanton and J. P. Hayes, “Properties of the input pattern fault
model,” in Proc. Int. Conf. Comput. Des., Oct. 1997, pp. 372–380.
[9] E. J. McCluskey, “Quality and single-stuck faults,” in Proc. Int. Test
Conf., Oct. 1993, p. 597.
[10] K. Y. Cho, S. Mitra, and E. J. McCluskey, “Gate exhaustive testing,” in
Proc. Int. Test Conf., Nov. 2005.
[11] I. Pomeranz and S. M. Reddy, “A measure of quality for N-detection
test sets,” IEEE Trans. Comput., vol. 53, no. 11, pp. 1497–1503, Nov.
2004.
[12] B. Benware, C. Schuermyer, N. Tamarapalli, K.-H. Tsai, S. Ranganathan, R. Madge, J. Rajski, and P. Krishnamurthy, “Impact of
multiple-detect test patterns on product quality,” in Proc. Int. Test Conf.,
Sep.–Oct. 2003, pp. 1031–1040.
[13] M. E. Amyeen, S. Venkataraman, A. Ojha, and S. Lee, “Evaluation of
the quality of N-detect scan ATPG patterns on a processor,” in Proc.
Int. Test Conf., Oct. 2004, pp. 669–678.
[14] R. D. Blanton, K. N. Dwarakanath, and A. B. Shah, “Analyzing the
effectiveness of multiple-detect test sets,” in Proc. Int. Test Conf., Sep.–
Oct. 2003, pp. 876–885.
[15] Y.-T. Lin, O. Poku, N. K. Bhatti, and R. D. Blanton, “Physically-aware
N-detect test pattern selection,” in Proc. DATE, Mar. 2008, pp. 634–639.
[16] Y.-T. Lin, O. Poku, R. D. Blanton, P. Nigh, P. Lloyd, and V. Iyengar,
“Evaluating the effectiveness of physically-aware N-detect test using real
silicon,” in Proc. Int. Test Conf., Oct. 2008.
[17] P. C. Maxwell, R. C. Aitken, K. R. Kollitz, and A. C. Brown, “IDDQ
and AC scan: The war against unmodeled defects,” in Proc. Int. Test
Conf., Oct. 1996, pp. 250–258.
[18] P. Nigh, W. Needham, K. Butler, P. Maxwell1, and R. Aitken, “An
experimental study comparing the relative effectiveness of functional,
scan, IDDq and delay-fault testing,” in Proc. VLSI Test Symp., May
1997, pp. 459–464.
[19] J. T.-Y. Chang, C.-W. Tseng, Y.-C. Chu, S. Wattal, M. Purtell, and E. J.
McCluskey, “Experimental results for IDDQ and VLV testing,” in Proc.
VLSI Test Symp., Apr. 1998, pp. 118–123.
[20] C.-W. Tseng and E. J. McCluskey, “Multiple-output propagation transition fault test,” in Proc. Int. Test Conf., Oct. 2001, pp. 358–366.
[21] S. Chakravarty, A. Jain, N. Radhakrishnan, E. W. Savage, and S. T.
Zachariah, “Experimental evaluation of scan tests for bridges,” in Proc.
Int. Test Conf., Oct. 2002, pp. 509–518.
[22] B. R. Benware, R. Madge, C. Lu, and R. Daasch, “Effectiveness
comparisons of outlier screening methods for frequency dependent
defects on complex ASICs,” in Proc. VLSI Test Symp., May 2003, pp.
39–46.
[23] E. J. McCluskey, A. Al-Yamani, J. C.-M. Li, C.-W. Tseng, E. Volkerink,
F.-F. Ferhani, E. Li, and S. Mitra, “ELF-Murphy data on defects and test
sets,” in Proc. VLSI Test Symp., Apr. 2004, pp. 16–22.
[24] S. Mitra, E. Volkerink, E. J. McCluskey, and S. Eichenberger, “Delay
defect screening using process monitor structures,” in Proc. VLSI Test
Symp., Apr. 2004, pp. 43–48.
[25] S. Chakravarty, Y. Chang, H. Hoang, S. Jayaraman, S. Picano, C. Prunty,
E. W. Savage, R. Sheikh, E. N. Tran, and K. Wee, “Experimental
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
evaluation of bridge patterns for a high performance microprocessor,”
in Proc. VLSI Test Symp., May 2005, pp. 337–342.
R. Guo, S. Mitra, E. Amyeen, J. Lee, S. Sivaraj, and S. Venkataraman,
“Evaluation of test metrics: Stuck-at, bridge coverage estimate and gate
exhaustive,” in Proc. VLSI Test Symp., Apr.–May 2006, pp. 66–71.
E. N. Tran, V. Kasulasrinivas, and S. Chakravarty, “Silicon evaluation
of logic proximity bridge patterns,” in Proc. VLSI Test Symp., Apr.–May
2006, pp. 78–85.
C. Schuermyer, J. Pangilinan, J. Jahangiri, M. Keim, and J. Rajski,
“Silicon evaluation of static alternative fault models,” in Proc. VLSI
Test Symp., May 2007, pp. 265–270.
J. Geuzebroek, E. J. Marinissen, A. Majhi, A. Glowatz, and F. Hapke,
“Embedded multi-detect ATPG and its effect on the detection of unmodeled defects,” in Proc. Int. Test Conf., Oct. 2007.
S. Eichenberger, J. Geuzebroek, C. Hora, B. Kruseman, and A. Majhi,
“Toward a world without test escapes: The use of volume diagnosis to
improve test quality,” in Proc. Int. Test Conf., Oct. 2008.
Y.-T. Lin and R. D. Blanton, “Test effectiveness evaluation through
analysis of readily-available tester data,” in Proc. Int. Test Conf., Nov.
2009.
S. Venkataraman and W. K. Fuchs, “A deductive technique for diagnosis
of bridging faults,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 1997,
pp. 562–567.
X. Yu and R. D. Blanton, “Multiple defect diagnosis using no assumptions on failing pattern characteristics,” in Proc. Des. Automat. Conf.,
Jun. 2008, pp. 361–366.
X. Yu and R. D. Blanton, “An effective and flexible multiple defect
diagnosis methodology using error propagation analysis,” in Proc. Int.
Test Conf., Oct. 2008.
L. M. Huisman, “Diagnosing arbitrary defects in logic designs using
single location at a time (SLAT),” IEEE Trans. Comput.-Aided Des.
Integr. Circuits Syst., vol. 23, no. 1, pp. 91–101, Jan. 2004.
W. Maly and J. Deszczka, “Yield estimation model for VLSI artwork
evaluation,” Electron. Lett., vol. 19, no. 6, pp. 226–227, Mar. 1983.
R. Desineni, O. Poku, and R. D. Blanton, “A logic diagnosis methodology for improved localization and extraction of accurate defect behavior,” in Proc. Int. Test Conf., Oct. 2006.
X. Yu, Y.-T. Lin, W. C. Tam, O. Poku, and R. D. Blanton, “Controlling
DPPM through volume diagnosis,” in Proc. VLSI Test Symp., May 2009,
pp. 134–139.
A. Jain, “Arbitrary defects: Modeling and applications,” Masters thesis,
Graduate School, Rutgers Univ., New Brunswick, NJ, Oct. 1999.
The TetraMAX Reference Manual, Synopsys, Inc., Mountain View, CA
[Online]. Available: http://www.synopsys.com
X. Yu and R. D. Blanton, “Estimating defect-type distributions through
volume diagnosis and defect behavior attribution,” in Proc. Int. Test
Conf., Nov. 2010.
The Encounter Diagnostics Reference Manual, Cadence Design Systems, Inc., San Jose, CA [Online]. Available: http://www.cadence.com
Yen-Tzu Lin (S’05–M’10) received the B.S. and
M.S. degrees in electrical engineering from National
Tsinghua University, Hsinchu, Taiwan, in 2000 and
2002, respectively, and the Ph.D. degree in electrical
and computer engineering from Carnegie Mellon
University, Pittsburgh, PA, in 2010.
She is currently with NVIDIA Corporation, Santa
Clara, CA, where she focuses on developing new
solutions for enabling rapid identification of key
yield detractors. Her current research interests include integrated circuit testing and diagnosis, design
for manufacturability, and yield ramps.
R. D. (Shawn) Blanton (S’93–M’95–SM’03–
F’09) received the Bachelors degree in engineering
from Calvin College, Grand Rapids, MI, in 1987,
the Masters degree in electrical engineering from
the University of Arizona, Tucson, in 1989, and the
Ph.D. degree in computer science and engineering
from the University of Michigan, Ann Arbor, in
1995.
He is currently a Professor with the Department
of Electrical and Computer Engineering at Carnegie
Mellon University, Pittsburgh, PA, where he is also
the Director of the Center for Silicon System Implementation (CSSI), an
organization consisting of 18 faculty members and over 80 graduate students
focused on the design and manufacture of silicon-based systems.
Download