Process Variation and Stochastic Design and Test

advertisement
Process Variation and Stochastic Design and Test
Statistical Test Compaction
Using Binary Decision Trees
Sounil Biswas and Ronald D. (Shawn) Blanton
Carnegie Mellon University
plane boundaries between classes in the
Tkept hyperspace to maximize the distance between the class boundaries and
the measurement data. However, this
optimization step in SVM-based classifiers requires specifying the shape of
these class boundaries (linear, Gaussian,
exponential, and so on). This assumption
can add error to Fi(Tkept).
Classification based on neural networks uses several
simple functions of the Tkept measurement values. This
classification technique organizes these simple functions as a network of nodes, where each node is called
a perceptron. A network of perceptrons results in a complex nonlinear classification function that describes
Fi(Tkept). For example, a neural network representation
of the binary function (f: 1 ≤ x ≤ 3 → True) can simply
be the logical AND function g of two binary functions
(f1: x ≥ 1) and (f2: x ≤ 3). The three perceptrons in this
example are f1, f2, and g, where f = g(f1, f2). During the
process of deriving a neural network representation of
Fi(Tkept), the algorithm used to derive the neural network
iteratively updates the perceptrons’ parameters to
improve the prediction accuracy of Fi(Tkept) until maximum classification accuracy is reached for some sample data typically called training data. Stratigopoulos
and Makris discuss an application of neural networks
to test, where they examine the correlation between
alternate-test measurements and the pass/fail outcomes
of fabricated parts.5 Derivation and training of a neural
network, however, is an NP-complete process,6 and thus
could require significant computation time.
In this work, we use binary decision trees (BDTs) for
statistical test compaction, because they have the following properties. First, decision trees require no
assumption on the type of correlation (if any) that exists
between Tred and Tkept. This makes it possible to derive
a more accurate representation of Fi(Tkept) from the col-
Editor’s note:
Integrated, heterogeneous devices must deal with process variation, and the
cost of testing such devices often exceeds that of digital circuits. The authors
use statistics to cross-correlate different tests and find the minimum set
required to guarantee product quality while reducing test time and cost.
—T.M. Mak, Intel
EXPLICITLY TESTING an integrated, heterogeneous
device for all its specifications is very costly. A new test
methodology that can minimize test cost while maintaining product quality and limiting yield loss is needed
(see the “Statistical learning approaches in test” sidebar).
We are developing a statistical learning methodology
based on decision trees to compact the complete specification-based test set of an integrated device by eliminating redundant tests. A test is deemed redundant if we
can reliably predict its output using other tests that are
not eliminated.
We use binary classifiers to identify specificationbased tests whose pass/fail results can be accurately predicted through measurements from kept tests (other
specification tests that are somehow deemed necessary
through analysis or by expert opinion). We call the set of
tests that can be accurately predicted redundant tests,
and we denote them as Tred; we denote the set of kept
tests as Tkept. Therefore, given a set of specification tests,
T = {t1, t2, …, tl}, where l is the number of tests in T, we can
express the pass/fail result yi of each redundant test ti ∈
Tred as function yi = Fi(Tkept), where T = {Tred ∪ Tkept}. We
call the process of identifying Tred and Tkept using statistical interpretation of test data statistical test compaction.
Binary classification techniques that might be suitable for deriving Fi(Tkept) include, for example, support
vector machines (SVMs),1 neural networks,2 and decision trees.3 SVM-based binary classifiers are especially
popular, and we have used them for statistical test compaction.4 In such classifiers, the goal is to derive hyper-
452
0740-7475/06/$20.00 © 2006 IEEE
Copublished by the IEEE CS and the IEEE CASS
IEEE Design & Test of Computers
© 2006 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution
to servers or lists, or reuse of any copyrighted component of this work in other works.
lected test data. Also, deriving a decision
tree model for Fi(Tkept) simply involves
partitioning the Tkept hyperspace into
hypercubes, which is a polynomial time
process of complexity O(n2 × k3), where
n is the number of tests in Tkept, and k is
the number of parts in the collected
data.7 Therefore, the computation time
required for creating a decision tree can
be considerably less than the time
required for training a neural network.
However, the application of a statistical test compaction methodology to a
commercial device requires very high
prediction accuracy, which is not achievable by simply using a typical decision
tree model of Fi(Tkept). Our work involving SVM-based classifiers4 and the work
of Stratigopoulos and Makris using neural networks5 also fail to meet this requirement. Consequently, to achieve a
commercial device’s accuracy demands,
we employ several enhancements not
inherent to decision trees. These
enhancements include scrubbing the
collected data to eliminate outliers,
reducing drift between individual wafers
and wafer lots due to process variations,
and applying principal component
analysis (PCA) to derive relevant measurement combinations. We also use
guard banding of the specification
boundaries and hypercube collapsing to
further improve prediction accuracy.
Some of these enhancements, such as
hypercube collapsing, are novel contributions to statistical test compaction.
Test compaction results produced for a
commercial microelectromechanical
systems (MEMS) accelerometer are
promising because they indicate that our
methodology makes it possible to eliminate an expensive mechanical test for
the accelerometer.
Statistical learning approaches in test
The standard approach of explicitly testing an integrated, heterogeneous
(mixed-signal, analog, microelectromechanical, and so on) device for all its
specifications can be prohibitively expensive. There’s now a growing trend
to use statistical learning involving test data from fabricated chips to reduce
this test cost. These attempts have mainly focused on regression techniques,
in which the goal is either to predict the value of one device measurement
using others1-3 or fit the parameters of predetermined relationships between
the device measurements.4,5 For example, some researchers use measurements from alternate tests and regression to predict device specifications1 or
parameters.2 Alternate tests are simplified stimuli and response measurements that are significantly less expensive than their specification-based
counterparts.
Other researchers perform Monte Carlo simulations of the device design
to derive the device measurements’ joint probability distribution (JPD).4,5 They
then use this JPD to determine parameter values for the function that relates
the measurements. Brockman and Director aim to ensure that the function
can accurately predict test measurements.4 Milor and Sangiovanni-Vincentelli
want to create a function that can predict out-of-spec measurements when a
defect exists.5
Go/no-go testing, however, requires only modeling the pass/fail outcome
for the device under test. This is far simpler than predicting a test’s actual
measurement value using regression or some other technique. In other words,
go/no-go testing is a binary classification problem, so it’s best to use statistical learning based on binary classification to predict the corresponding test’s
pass/fail outcome.
References
1. A. Chatterjee and R. Voorakaranam, “Test Generation for Accurate Prediction
of Analog Specifications,” Proc. 18th IEEE VLSI Test Symp. (VTS 00), IEEE CS
Press, 2000, pp. 137-142.
2. V. Natarajan, S. Bhattacharya, and A. Chatterjee, “Alternate Electrical Test for
Extracting Mechanical Parameters of MEMS Accelerometer Sensors,” Proc.
24th IEEE VLSI Test Symp. (VTS 06), IEEE Press, 2006, pp. 665-673.
3. L.-C. Wang et al., “On Path-Based Learning and Its Applications in Delay Test
and Diagnosis,” Proc. 41st Design Automation Conf. (DAC 04), ACM Press,
2004, pp. 492-497.
4. J.B. Brockman and S.W. Director, “Predictive Subset Testing: Optimizing IC
Parametric Performance Testing for Quality, Cost, and Yield,” IEEE Trans.
Semiconductor Manufacturing, vol. 2, no. 3, Aug. 1989, pp. 104-113.
5. L. Milor and A.L. Sangiovanni-Vincentelli, “Minimizing Production Test Time to
Detect Faults in Analog Circuits,” IEEE Trans. Computer-Aided Design of
Integrated Circuits and Systems, vol. 13, no. 6, June 1994, pp. 796-813.
Proposed methodology
The n tests in Tkept describe an ndimensional hyperspace. Test response data for each
fabricated instance or part is a data point in the Tkept
hyperspace; all the test data together form a distribution
November–December 2006
in that hyperspace. Parts that pass each test ti ∈ Tred within the Tkept hyperspace describe passing subspaces,
whereas the failing parts represent failing subspaces.
453
Process Variation and Stochastic Design and Test
Our goal in deriving Fi(Tkept) is to separate these two
types of subspaces using our BDT-based statistical learning method. The test response data we use to derive this
BDT representation of Fi(Tkept) is called the training data.
However, the training data might not reflect future data,
because of inaccuracies from measurement errors, a
lack of adequate training data, defects, and so on. This
means Fi(Tkept) could achieve 100% accurate predictions for the training data yet exhibit high misprediction
for future data. To remedy this shortcoming, we use
another set of collected data, the validation data, to
check the model pass/fail prediction accuracy. We
measure the prediction accuracy by determining the
fraction of passing or failing parts (from both the validation and training data) mispredicted by Fi(Tkept). We
then use this accuracy to determine whether the tests
in Tred are indeed redundant. In other words, if the misprediction error of Fi(Tkept) is lower than a preset threshold, the choice of Tred is deemed correct. On the other
hand, if the misprediction error is higher than the preset threshold, the tests in Tred are not redundant, meaning we must redefine Tred or abort the test compaction
process altogether.
Binary decision trees
We adapted the terminology used in this section from
Rokach and Maimon.3 A BDT is a classifier expressed as
a recursive partitioning of the Tkept hyperspace. Each nonterminal vertex in the decision tree represents a partition, which is an (n–1)-dimensional hyperplane that
separates the Tkept hyperspace into two subspaces at a
particular test measurement value vi,j for a test ti ∈ Tkept.
Test response value vi,j is called a decision value, and the
corresponding vertex in the tree is called a decision vertex. The subspace of parts resulting from the partition
with measured values less than vi,j is the left-child vertex;
the subspace of parts with values greater than or equal
to vi,j is the right-child vertex. Each test ti ∈ Tkept can have
several decision values {vi,1, vi,2, …, vi,m}, meaning a
dimension can have multiple partitions. Each terminal
vertex in a decision tree is ideally a homogeneous hypercube containing only passing or failing parts.
Decision tree construction uses characteristics from
a sample of manufactured parts for selecting a test ti ∈
Tkept to partition the hyperspace. The partitioning of the
hyperspace with various test measurement values continues until all parts in the sample are partitioned (classified) into homogeneous, leaf-level vertices for all the
tests in Tred. Once the tree derivation algorithm has
obtained the BDT representation of Fi(Tkept), we deter-
454
mine the prediction of a future chip as pass or fail simply by traversing the tree from its root vertex to a terminal vertex according to the chip’s measurements for the
tests in Tkept.
Deriving a BDT representation of Fi(Tkept) is a two-step
process, involving tree construction and tree pruning.
During depth-first tree construction, the tree construction algorithm evaluates the capability of test measurement value vi,j to homogenize the subspace. The
algorithm examines the homogeneity of a test and a
measurement value vi,j as a candidate partition if the current vertex has at least two parts m and n (one passing
and one failing) with measurement values ρi,m and ρi,n,
respectively, such that ρi,m ≤ vi,j ≤ ρi,n or ρi,n ≤ vi,j ≤ ρi,m is
true. From the test measurement values evaluated, the
tree construction algorithm selects, as the vertex’s decision value, the vi,j that maximizes the separation of passing and failing parts in the subspace. The depth-first
construction process continues until all training parts are
homogeneously partitioned or until further partitioning
of the Tkept hyperspace does not improve accuracy.
We gauge the derived decision tree model’s prediction accuracy by observing the prediction error of
Fi(Tkept) in the validation data. We can further improve
the prediction accuracy by “pruning” the tree on the
basis of how it classifies the validation data. During a
breadth-first traversal that begins with the n – 1 level, the
tree-pruning algorithm analyzes each vertex to determine if eliminating the partition and converting the corresponding left- and right-child trees into a single passing
or failing hypercube would improve the prediction accuracy. For cases in which the tree-pruning algorithm determines that the prediction accuracy would improve, the
algorithm replaces the decision vertex and all its descendants with a single passing or failing terminal vertex,
depending on the one that leads to the greatest improvement. This is called the decision node deletion. In addition, the algorithm analyzes each decision vertex to
determine whether a small perturbation of the decision
value vi,j can improve the prediction accuracy; if so, the
algorithm employs this perturbation. We call this decision value shifting. These two tree-pruning operations,
decision node deletion and decision value shifting,
together can lead to a more accurate Fi(Tkept).
Figure 1a shows an example data distribution for a
Tkept = {t1, t2} hyperspace. The lines in Figure 1a partially
separate passing and failing parts; the dotted line shows
the last partition chosen. Figure 1b shows the resulting,
partially formed decision tree. The shaded region indicates the tree vertices from the dotted-line partition in
IEEE Design & Test of Computers
Measurement t2
Measurement t2
A
b1
B
C
a1
a3 a2
Measurement t1
Measurement t1
(a)
(a)
t2 = b1
t1 = a1
Left
t1 = a3
t1 = a2
Left
Left
Right
Measurement t2
A
B
Right
C
Measurement t1
(b)
(b)
Figure 1. An example of a data distribution
partially separated in the kept test Tkept = {t1, t2},
Figure 2. Possible misclassification of a future
hyperspace (a), and the corresponding partially
failing part (circled triangle) due to inadequate
formed binary decision tree (BDT) (b). Circles
coverage of failing subspaces by the training
represent parts that pass Tred; triangles
data (a), and the collapsing of passing
represent parts that fail Tred. Dotted line in (a)
hypercubes (shaded subspaces) to eliminate
denotes the last partition chosen in the decision
this error (b).
tree derivation process. Shaded region in (b)
represents new vertices added to the decision
tree by the last partition.
Figure 1a. The subspace corresponding to the new left
child contains only passing parts; thus, the child is a
passing terminal vertex of the tree. However, the subspace corresponding to the new right child contains
both passing and failing parts and hence requires further partitioning. Therefore, the right child is a nonterminal vertex in the tree.
Hypercube collapsing
The decision tree representation of Fi(Tkept) is simply
the partition of the Tkept hyperspace into hypercubes that
separate the passing and failing subspaces. However,
training data collected from a high- or low-yielding fabrication process can result in an insufficient number of
November–December 2006
training parts from one class. Consequently, some partitions necessary to achieve an acceptable level of prediction accuracy might be absent. As a result, some portions
of the passing or failing subspaces could be erroneously
included in hypercubes of the opposite kind, leading to
misclassification of a future data point (a manufactured
part) falling into that hyperspace. We are particularly interested in defective-part misprediction when the training
data is from a high-yielding process and when portions of
the failing subspaces are likely incorrectly classified as
passing. Figure 2a shows an artificial collection of training data for a Tkept = {t1, t2} hyperspace with high yield. The
shaded regions represent the passing hypercubes in the
Tkept hyperspace. The circled triangle represents a failing
part that was misclassified because the training data did
not sufficiently cover the failing subspaces.
We have observed that for a sufficiently large sample
455
Process Variation and Stochastic Design and Test
Measurement tb
Predicted pass
defect escape
tb specification
boundary
Pass/fail decision
boundary on ta
(a)
Measurement ta
Measurement tb
Guard-band device
BDT model II
Guard-band
perturbations
BDT
model I
tb specification
boundary
Pass/fail decision
boundaries on ta due
to guard banding
I
sponding to the partitions along the path. For a passing
terminal vertex, the encountered decision values define
the bounds of the hypercube associated with the vertex.
Any test not included in the path from the root to the terminal vertex manifests itself as a missing bound that
reduces the corresponding hypercube’s dimensionality
and unnecessarily increases its size. Adding partitions
for the missing tests on the basis of which failing parts
are present in the hypercube will collapse it. However,
the absence of failing parts in the vertex makes identifying decision values for these missing tests impossible.
Therefore, instead, the algorithm derives decision values for the missing tests from the passing data in the vertex. For the worst-case scenario, with n tests in Tkept and k
passing parts in the collected data, hypercube collapsing creates n × k additional decision values. Derivation
of the decision tree model with hypercube collapsing is
still polynomial, but with complexity O{(n2 × k3) +
(n × k)}. Also, the increased complexity of using the tree
with the collapsed hypercubes is at most n – 1.
II
Specification guard banding
Measurement ta
(b)
Figure 3. Failing part misprediction due to overlap of passing
and failing parts when the test response data is projected to
the Tkept = {ta} dimension (a), and elimination of the
misprediction using our two-model approach of specification
guard banding (b).
of high-yielding training data, passing parts in the training data adequately cover the passing subspaces.
Therefore, to guard against the expensive scenario of mispredicting a defective part, we assume that the portions
of the passing hypercubes that don’t include any passing
part are failing subspaces. So, we “collapse” the passing
hypercube boundaries to coincide with the passing-part
data in the hypercube, and we denote any remaining
subspace as failing. We will then classify all future parts
residing in any of these new failing subspaces as defective. These parts cannot contribute to the total number
of defective parts being mispredicted, but they can lead
to an increase in yield loss, meaning a passing part will
be mispredicted as bad. Figure 2b shows the result of collapsing the passing hypercube bounds from Figure 2a.
The mispredicted failing part is now correctly classified.
Our algorithm performs hypercube collapsing as follows. Each path from a decision tree’s root vertex to a
terminal vertex includes several decision values corre-
456
Because of several limiting factors that include, for
example, finite training data and errors in test response
measurements, Fi(Tkept) will likely misclassify some future
fabricated parts. Although completely eliminating the
classification error is impractical, we hope to further
reduce it by guard-banding the specification boundaries
for each test ti ∈ Tred. We formulate guard bands as follows. We reevaluate each training part’s pass/fail attribute
after perturbing the specified range of measurement values deemed passing for each test ti ∈ Tred to obtain two
new sets of training data. We derive the first set by reducing the acceptable range of measurement values for each
test ti ∈ Tred, possibly causing us to treat some passing
parts as failing in this data set. We obtain the second data
set by expanding the acceptable range for each test ti ∈
Tred, in this case possibly leading us to treat some failing
parts as passing in this data set. We then construct two
separate decision trees for these two training data sets,
and together they constitute Fi(Tkept). We subject each
new part to both decision trees for prediction. If the two
models classify a part identically, we place higher confidence in the decision and accordingly classify the part
as passing or failing. On the other hand, if the models disagree, the prediction uncertainty leads to a guard-band
classification—that is, we conclude that we cannot make
a high-confidence pass/fail prediction for the part solely
on the basis of Tkept measurements. Figure 3a shows the
possible misprediction error due to the overlap of the pro-
IEEE Design & Test of Computers
Measurement t2
jections of passing and failing parts onto the Tkept = {ta}
hyperspace. Figure 3b demonstrates how guard banding
eliminates this misclassification error by placing these
overlapping parts into a third, guard-band class.
Depending on the number of expected devices in
the guard-band subspace and the cost, we can test
guard-band devices further to determine their true
pass/fail status.8 Alternately, we might decide that the
devices in the guard-band subspace are good, bad, or
even of lower grade, depending on the application’s
quality requirements.
A
(a)
Measurement t1
The collected test measurements for Tkept, though
acceptable in terms of the specifications, can include
some anomalies. For example, measurements from one
wafer to another or from one lot to another can “drift”
(that is, measurements from all parts in one wafer or lot
might be shifted by the same value with respect to
equivalent parts in some other wafer or lot). In addition,
there could be outlier parts, which are acceptable
according to the specifications but are identifiably different from the rest of the population. Several factors
could be contributing to these anomalies, including a
part’s location on a wafer, its wafer location within a lot,
its fabrication time, the test equipment used to obtain
its measurements, the presence of defects, or any combination of these. These anomalies typically manifest
themselves in the Tkept hyperspace as passing or failing
parts in subspaces predominantly covered by parts of
the opposite kind. Outlying and drifting parts can therefore lead to an erroneous derivation of Fi(Tkept), which
in turn can cause a higher misprediction rate, especially for failing parts. Therefore, it’s important to “scrub”
the collected test measurement data to eliminate these
anomalies. We propose three scrubbing steps:
■
■
■
principal component analysis (PCA),
outlier elimination, and
drift reduction.
Principal component analysis
For the data we examined, we observed that combinations of Tkept measurements can sometimes show
increased correlation to yi, the pass/fail outcomes of a test
ti ∈ Tred. Using combinations of Tkept measurements for
decision tree modeling, therefore, could lead to more
accurate subspace partitions and might be better suited
for deriving Fi(Tkept). We use PCA to identify combinations
of dimensions (tests).9 PCA derives relevant, linear com-
November–December 2006
Measurement t´2
Data scrubbing
(b)
Measurement t´1
Figure 4. Example training data distribution for
passing hypercubes with a large amount of
white space (cube A), which can lead to
misprediction of future failing parts (circled
triangle in cube A) (a); and elimination of white
space using principal component analysis (PCA)
to derive a new Tkept hyperspace (b).
binations of tests from Tkept using the training data to
describe an orthonormal Tkept hyperspace. In other words,
PCA potentially realigns the training data distribution
along the derived orthonormal axes in the Tkept hyperspace. This is beneficial for decision tree modeling
because typical partitions in a decision tree are orthogonal to a measurement axis in the Tkept hyperspace. When
we apply decision tree partitioning to the orthonormal
hyperspace derived using PCA, the hypercube boundaries are usually parallel to the population spreads. So,
the resulting hypercubes might now contain a significantly reduced amount of white space. As with hypercube collapsing, we believe that these white spaces
probably belong to failing subspaces and therefore
should not be in passing hypercubes. According to this
rationale, Fi(Tkept) obtained from a PCA-based orthonormal description of Tkept hyperspace would probably be
more accurate in predicting future pass/fail outcomes.
Figure 4a shows an example of a training data distri-
457
Process Variation and Stochastic Design and Test
Measurement t2
Outliers leading to
spurious passing hypercubes
Measurement t1
Measurement t2
(a)
(b)
Measurement t1
Figure 5. Part distribution example showing the
presence of outliers leading to spurious passing
hypercubes in failing subspaces (a), and
bution but still satisfy all specifications. As mentioned
earlier, the presence of outliers can corrupt the derivation of Fi(Tkept). Specifically, an outlier that passes all
tests can reside in a subspace that should be classified
as failing in order to minimize prediction error. What
results, however, because of one or more outliers, is the
creation of a passing hypercube within a failing subspace. Because outliers are inherently unlikely, the passing hypercube will lead to the misprediction of future
failing parts. We classify a passing datum as an outlier
on the basis of its Euclidean distance from its nearest
passing neighbors. If this distance is greater than a preset threshold, we identify the part as an outlier and eliminate it from the training data.
Figure 5a shows a part distribution example with circles and triangles representing passing and failing parts,
respectively. The decision tree partitioning of the distribution in Figure 5a results in spurious passing hypercubes (shown as shaded areas) in the failing subspaces
because of the presence of outlying passing parts. The
decision tree model derived from the part distribution
shown in Figure 5a will therefore mispredict future failing parts residing in any of these failing subspaces.
Eliminating these outlying passing parts removes these
erroneous hypercubes from the decision tree model
and results in a more accurate Fi(Tkept). Figure 5b illustrates how removal of these outlying passing parts eliminates the spurious passing hypercubes.
elimination of outliers leading to removal of
these spurious hypercubes (b).
bution for a Tkept hyperspace consisting of two tests, t1
and t2. Once again, the circles represent passing parts,
and the triangles represent failing parts. Figure 4a also
shows a possible partitioning of the hyperspace. Passing
hypercube A has a significant amount of white space
that contains no training part. As a result, the decision
tree model will erroneously classify as passing a future
failing part (shown as a circled triangle) residing in this
empty region. Figure 4b illustrates the partitioning of the
same data distribution in a Tkept = {t′1, t′2 } hyperspace
derived using PCA. The resulting passing hypercubes
are minimal and contain very little white space. With
this new partitioning, the decision tree model correctly
identifies the future failing part.
Outlier elimination
Outliers are parts in the Tkept hyperspace with test
measurements that significantly deviate from the distri-
458
Drift reduction
Another potential source of error in the derivation of
Fi(Tkept) is wafer or lot drift, in which the part distribution’s location in the Tkept hyperspace shifts from wafer
to wafer or lot to lot. Consequently, different wafer or
lot distributions that ideally should have overlapped to
describe homogeneous (passing or failing) subspaces
are now skewed because of drift, resulting in erroneous
overlap of passing and failing parts. Thus, drift can
cause misclassification of future parts.
Figure 6a shows distributions for two example
groups of test data, indicated by solid and hollow markers. Again, circles in each group represent passing parts,
and triangles represent failing parts. The shaded areas
show overlap of passing and failing hypercubes of trees
derived from the two groups of data. Therefore, a decision tree model derived from one group will make mispredictions with respect to the other group.
We minimize drift by measuring the Euclidean distance between the two distributions’ medians (a representation of the distribution centers) and then
IEEE Design & Test of Computers
Test compaction experiment
We conducted a test compaction experiment using
a commercially available MEMS accelerometer11—an
integrated transducer that converts acceleration to an
electrical signal. The testing process for an accelerometer involves not only electrical tests but also mechanical tests (that is, tests aimed at evaluating the
accelerometer’s mechanical components) at room,
cold, and hot temperatures.12 These mechanical tests
are far more expensive than their all-electrical counterparts. Therefore, we can avoid significant cost if we can
predict the outcomes of these mechanical tests using
only the electrical-test measurements.
We applied our statistical test compaction methodology to test measurements for more than 70,000
accelerometers. Test engineers collected these measurements at various times over a nine-month period.
We investigated the possible elimination of the mechanical tests at both cold and hot temperatures. We used
80% of the accelerometers for training, and all of them
for validation. Using fivefold cross validation, we measured the prediction accuracy after eliminating each
mechanical test. For this type of validation, we derived
five different training data sets from the test measurements. Any pair of training data sets differed from each
other for 20% of the parts. We then used each training
data set to derive a decision tree model, which we eval-
November–December 2006
Measurement t2
Drift between
the two groups
of test data
Subspaces with
mispredictions
Measurement t1
(a)
Measurement t2
subtracting this value from the test measurements of
each part in the distribution being predicted. This is simply the linear translation of a future distribution to the
one used to derive the decision tree. Using a median as
a central measurement for a part distribution reduces
sensitivity to outliers. Figure 6b shows the result of drift
reduction for the two distributions in Figure 6a.
Specifically, we use the difference between the two
groups’ medians to shift the solid-marked data onto the
hollow-marked data. This shift eliminates the overlap of
passing and failing parts from the two distributions and
thus improves the prediction accuracy of the decision
tree model derived from the hollow-marked data.
Applying these data-scrubbing techniques to a commercial chip, however, will likely require several modifications to the standard test flow. For example,
pass/fail prediction for a fabricated part is possible only
after collecting the test measurements from all chips in
a lot or wafer. We are currently exploring the issues
related to making data scrubbing feasible in an industry test environment, and the work of Turakhia et al.
could provide a viable solution.10
Measurement t1
(b)
Figure 6. Drift between the distributions of two example
groups of test data with Tkept = {t1, t2} (a), and elimination of
the misprediction error through drift reduction (b). Shaded
regions in (a) show places where parts are mispredicted.
uated for the amount of misprediction upon elimination of a test. The average level of misprediction for the
five training data sets characterized the classifier’s quality for the possible elimination of the test. We observed
that the test measurement data used in this experiment
included outliers and lot-to-lot drift, so we scrubbed the
data to eliminate the effect of drift. In addition, we treated the training data to remove passing outliers. Finally,
we used guard banding to reduce misprediction for
tests whose elimination resulted in unacceptable errors.
Using custom software developed at Carnegie Mellon
University, we constructed a tree in just a few hours,
using a data set of more than 50,000 parts. Using this tree
for classification took only a few seconds for all of the
tens of thousands of parts predicted.
459
Process Variation and Stochastic Design and Test
Table 1. Fivefold cross-validation percentages of passing
Table 2. Fivefold cross-validation percentages of passing
and failing parts mispredicted and their averages when
and failing parts mispredicted and their averages when
we eliminate the cold mechanical test.
we eliminate the hot mechanical test.
Failing parts
Passing parts
Run
mispredicted (%)
mispredicted (%)
Run
1
0.00
0.06
1
9.34
0.72
2
1.54
0.08
2
8.79
0.65
3
0.30
0.06
3
4.40
0.75
4
0.00
0.07
4
7.69
0.71
5
0.00
0.07
5
4.40
0.82
0.36
0.07
6.92
0.73
Average
Failing parts
Passing parts
mispredicted (%)
mispredicted (%)
Average
the hot mechanical test.
When we eliminated the
56.1%
0.89%
cold mechanical test,
Cold
34.6%
the average misprediction
Hot
error was only 0.36% for
98
failing parts and 0.07%
16.3%
for passing parts; when
we eliminated the hot
mechanical test, the resulting error was 6.92% and
96
0.73%, respectively, for the
failing and passing parts.
Although the percent4.3%
ages of mispredictions for
94
the mechanical test outcomes are extremely low,
we can use specification
guard banding to further
92
0
10
20
30
40
50
60
70
80
reduce these values. In
fact, we can easily achieve
Guard-banded parts (%)
zero defect escape for
the cold mechanical test.
Figure 7. Worst-case receiver operating characteristic (ROC) plots of percentages of
Figure 7 shows receiver
failing parts correctly identified versus percentage of guard-banded parts for a
operating characteristic
commercial accelerometer when cold (solid line) and hot (dotted line)
(ROC) plots for the worstmicromechanical tests are eliminated.
case fivefold cross-validation runs for cold (solid
We conducted separate test compaction experiments line) and hot (dotted line) mechanical tests. Specifically,
to analyze the possible elimination of the mechanical tests Figure 7 plots the percentages of correct predictions for
at both cold (–40°C) and hot (80°C) temperatures. We failing parts against the percentage of parts placed in the
included all the electrical tests at each temperature in Tkept guard band. (Note that the percentages reported along
to derive the decision tree model for the outcome of the the y-axis in Figure 7 are not the same as DPM—number
mechanical test at that temperature. Table 1 gives the five- of defective parts per million units shipped—but rather
fold cross-validation prediction results when we elimi- they are the percentages of failing parts mispredicted.)
nated the cold mechanical test. Table 2 gives the fivefold These plots demonstrate that it’s possible to improve the
cross-validation prediction results when we eliminated accuracy of Fi(Tkept). For example, we could probably
8.1%
78.4%
Failing parts correctly identified (%)
100
460
IEEE Design & Test of Computers
eliminate the cold mechanical test, because few parts
require guard banding to ensure that no defective parts
escape.
To illustrate the potential cost savings from using our
test compaction methodology, we developed a simple
model for the reduction in test cost per shipped part, ΔC:
work, and we thank Freescale employees Teresa
Maudie, Rick Neilsen, Ray Roop, and Brooks Scofield
for their insightful description of the test data.
References
1. V.N. Vapnik, Statistical Learning Theory, WileyInterScience, New York, 1998.
ΔC = (CE + CM)/Y – [CE + (CDE × DE) +
(CM × GB)]/(Y – ΔY)
2. A.K. Jain, J. Mao, and K.M. Mohiuddin, “Artificial Neural
Networks: A Tutorial,” IEEE J. Computers, vol. 29, no. 3,
Mar. 1996, pp. 31-44.
where CE and CM are the cost of applying the electrical
and mechanical tests to each manufactured part, CDE is
the cost of mispredicting a defective part, and Y is the
process yield. ΔY and DE are the fractions of good and
failing parts, respectively, that are mispredicted because
of the elimination of a mechanical test, and GB is the fraction of parts guard-banded (meaning they require application of the mechanical test). Assuming Y = 90%,
CM = $5, CE = $1, and CDE = $1,000, the model simplifies to
3. L. Rokach and O. Maimon, “Top-Down Induction of Decision Trees Classifiers—A Survey,” IEEE Trans.
Systems, Man, and Cybernetics—Part C: Applications
and Reviews, vol. 35, no. 4, Nov. 2005, pp. 476-487.
4. S. Biswas et al., “Specification Test Compaction for Analog Circuits and MEMS,” Proc. Design, Automation and
Test in Europe Conf. (DATE 05), IEEE CS Press, 2005,
pp. 164-169.
5. H.G.D. Stratigopoulos and Y. Makris, “Bridging the Accuracy of Functional and Machine-Learning-Based Mixed-
ΔC = 6.67 – [1 + (1000 × DE) + (5 × GB)]/(0.9 – ΔY)
Signal Testing,” Proc. IEEE VLSI Test Symp. (VTS 06),
IEEE CS Press, 2006, pp. 395-400.
Substituting DE = 0.36% and ΔY = 0.07% from Figure 7,
the cost reduction from eliminating the cold mechanical test with no guard banding is $5.16 per shipped part.
If, however, no defect escape is tolerable, we must
guard-band 8.1% of the manufactured parts. Assuming
we apply both the cold mechanical and electrical tests
to these guard-banded parts, the cost reduction is still
significant: $5.11 per shipped part. Either way, we eliminate more than 76% of the cold test’s cost.
6. B. DasGupta, H.T. Siegelman, and E. Sontag, “On the
Complexity of Training Neural Networks with Continuous
Activation Functions,” IEEE Trans. Neural Networks, vol.
6, no. 6, Nov. 1995, pp. 1490-1504.
7. J.K. Martin and D.S. Hirschberg, The Time Complexity of
Decision Tree Induction, tech. report 95-27, Dept. of
Information and Computer Science, Univ. of California,
Irvine, 1995.
8. R. Voorakaranam et al., “Production Deployment of a
Fast Transient Testing Methodology for Analog Circuits:
Case Study and Results,” Proc. Int’l Test Conf. (ITC 03),
THUS, OUR PROPOSED METHODOLOGY can eliminate
an expensive mechanical test for a commercially available accelerometer with little error. Moreover, it’s possible to completely eliminate the error (for failing parts)
using specification guard banding. But the same result
could not be achieved for the equivalent mechanical test
executed at an elevated temperature. Techniques such
as specification guard banding and drift removal can
reduce error, but more research is needed. More importantly, techniques are needed for incorporating this and
similar methodologies into a production test flow.
■
IEEE CS Press, 2003, pp. 1174-1181.
9. M.E. Wall, A. Rechsteiner, and L.M. Rocha, “Singular
Value Decomposition and Principal Component Analysis,” A Practical Approach to Microarray Data Analysis,
D.P. Berrar, W. Dubitzky, and M. Granzow, eds., Kluwer
Academic Publishers, 2003, pp. 91-109.
10. R.P. Turakhia et al., “Changing Test and Data Modeling
Requirements for Screening Latent Defects as Statistical
Outliers,” IEEE Design & Test, vol. 23, no. 2, Mar.-Apr.
2006, pp. 100-109.
11. S.D. Senturia, “A Capacitive Accelerometer,” Microsystem
Design, Kluwer Academic Publishers, 2003, pp. 497-530.
Acknowledgments
We acknowledge the use of Freescale Semiconductor accelerometer test data in this research
November–December 2006
12. T. Maudie et al., “MEMS Manufacturing Testing: An
Accelerometer Case Study,” Proc. Int’l Test Conf. (ITC
03), IEEE CS Press, 2003, pp. 843-849.
461
Process Variation and Stochastic Design and Test
F E AT U R I N G
IN 2007
• Healthcare
• Mining a Sensor-Rich
World
• Urban Computing
• Security & Privacy
IEEE Pervasive Computing delivers
the latest peer-reviewed developments in pervasive,
mobile, and ubiquitous computing to developers,
researchers, and educators who want to keep abreast
of rapid technology change. With content that’s
accessible and useful today, this publication acts as a
catalyst for progress in this emerging field, bringing
Sounil Biswas is a PhD candidate
in the Department of Electrical and
Computer Engineering at Carnegie
Mellon University. His research interests include test of integrated, heterogeneous systems; statistical analysis of test data; and
defect characterization. Biswas has a BTech in electrical engineering from the Indian Institute of Technology, Kanpur, and an MS in electrical and computer
engineering from Carnegie Mellon University.
Ronald D. (Shawn) Blanton is a
professor in the Department of Electrical and Computer Engineering at
Carnegie Mellon University, where he
is an associate director of the Center
for Silicon System Implementation (CSSI). His research
interests include test and diagnosis of integrated, heterogeneous systems. Blanton has a BS in engineering
from Calvin College, an MS in electrical engineering
from the University of Arizona, and a PhD in computer
science and engineering from the University of Michigan, Ann Arbor. He is a member of the ACM and a
senior member of the IEEE.
together the leading experts in such areas as
• Hardware technologies
• Software infrastructure
• Sensing and interaction
with the physical world
Subscribe
Now!
Direct questions and comments about this article
to Shawn Blanton, 2109 Hamerschlag Hall, Dept. of
Electrical and Computer Engineering, Carnegie Mellon
University, 5000 Forbes Ave., Pittsburgh, PA 15217;
blanton@ece.cmu.edu.
• Graceful integration of
human users
• Systems considerations,
including scalability,
security, and privacy
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
VISIT
www.computer.org/pervasive/subscribe.htm
462
IEEE Design & Test of Computers
Download