Process Variation and Stochastic Design and Test Statistical Test Compaction Using Binary Decision Trees Sounil Biswas and Ronald D. (Shawn) Blanton Carnegie Mellon University plane boundaries between classes in the Tkept hyperspace to maximize the distance between the class boundaries and the measurement data. However, this optimization step in SVM-based classifiers requires specifying the shape of these class boundaries (linear, Gaussian, exponential, and so on). This assumption can add error to Fi(Tkept). Classification based on neural networks uses several simple functions of the Tkept measurement values. This classification technique organizes these simple functions as a network of nodes, where each node is called a perceptron. A network of perceptrons results in a complex nonlinear classification function that describes Fi(Tkept). For example, a neural network representation of the binary function (f: 1 ≤ x ≤ 3 → True) can simply be the logical AND function g of two binary functions (f1: x ≥ 1) and (f2: x ≤ 3). The three perceptrons in this example are f1, f2, and g, where f = g(f1, f2). During the process of deriving a neural network representation of Fi(Tkept), the algorithm used to derive the neural network iteratively updates the perceptrons’ parameters to improve the prediction accuracy of Fi(Tkept) until maximum classification accuracy is reached for some sample data typically called training data. Stratigopoulos and Makris discuss an application of neural networks to test, where they examine the correlation between alternate-test measurements and the pass/fail outcomes of fabricated parts.5 Derivation and training of a neural network, however, is an NP-complete process,6 and thus could require significant computation time. In this work, we use binary decision trees (BDTs) for statistical test compaction, because they have the following properties. First, decision trees require no assumption on the type of correlation (if any) that exists between Tred and Tkept. This makes it possible to derive a more accurate representation of Fi(Tkept) from the col- Editor’s note: Integrated, heterogeneous devices must deal with process variation, and the cost of testing such devices often exceeds that of digital circuits. The authors use statistics to cross-correlate different tests and find the minimum set required to guarantee product quality while reducing test time and cost. —T.M. Mak, Intel EXPLICITLY TESTING an integrated, heterogeneous device for all its specifications is very costly. A new test methodology that can minimize test cost while maintaining product quality and limiting yield loss is needed (see the “Statistical learning approaches in test” sidebar). We are developing a statistical learning methodology based on decision trees to compact the complete specification-based test set of an integrated device by eliminating redundant tests. A test is deemed redundant if we can reliably predict its output using other tests that are not eliminated. We use binary classifiers to identify specificationbased tests whose pass/fail results can be accurately predicted through measurements from kept tests (other specification tests that are somehow deemed necessary through analysis or by expert opinion). We call the set of tests that can be accurately predicted redundant tests, and we denote them as Tred; we denote the set of kept tests as Tkept. Therefore, given a set of specification tests, T = {t1, t2, …, tl}, where l is the number of tests in T, we can express the pass/fail result yi of each redundant test ti ∈ Tred as function yi = Fi(Tkept), where T = {Tred ∪ Tkept}. We call the process of identifying Tred and Tkept using statistical interpretation of test data statistical test compaction. Binary classification techniques that might be suitable for deriving Fi(Tkept) include, for example, support vector machines (SVMs),1 neural networks,2 and decision trees.3 SVM-based binary classifiers are especially popular, and we have used them for statistical test compaction.4 In such classifiers, the goal is to derive hyper- 452 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers © 2006 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. lected test data. Also, deriving a decision tree model for Fi(Tkept) simply involves partitioning the Tkept hyperspace into hypercubes, which is a polynomial time process of complexity O(n2 × k3), where n is the number of tests in Tkept, and k is the number of parts in the collected data.7 Therefore, the computation time required for creating a decision tree can be considerably less than the time required for training a neural network. However, the application of a statistical test compaction methodology to a commercial device requires very high prediction accuracy, which is not achievable by simply using a typical decision tree model of Fi(Tkept). Our work involving SVM-based classifiers4 and the work of Stratigopoulos and Makris using neural networks5 also fail to meet this requirement. Consequently, to achieve a commercial device’s accuracy demands, we employ several enhancements not inherent to decision trees. These enhancements include scrubbing the collected data to eliminate outliers, reducing drift between individual wafers and wafer lots due to process variations, and applying principal component analysis (PCA) to derive relevant measurement combinations. We also use guard banding of the specification boundaries and hypercube collapsing to further improve prediction accuracy. Some of these enhancements, such as hypercube collapsing, are novel contributions to statistical test compaction. Test compaction results produced for a commercial microelectromechanical systems (MEMS) accelerometer are promising because they indicate that our methodology makes it possible to eliminate an expensive mechanical test for the accelerometer. Statistical learning approaches in test The standard approach of explicitly testing an integrated, heterogeneous (mixed-signal, analog, microelectromechanical, and so on) device for all its specifications can be prohibitively expensive. There’s now a growing trend to use statistical learning involving test data from fabricated chips to reduce this test cost. These attempts have mainly focused on regression techniques, in which the goal is either to predict the value of one device measurement using others1-3 or fit the parameters of predetermined relationships between the device measurements.4,5 For example, some researchers use measurements from alternate tests and regression to predict device specifications1 or parameters.2 Alternate tests are simplified stimuli and response measurements that are significantly less expensive than their specification-based counterparts. Other researchers perform Monte Carlo simulations of the device design to derive the device measurements’ joint probability distribution (JPD).4,5 They then use this JPD to determine parameter values for the function that relates the measurements. Brockman and Director aim to ensure that the function can accurately predict test measurements.4 Milor and Sangiovanni-Vincentelli want to create a function that can predict out-of-spec measurements when a defect exists.5 Go/no-go testing, however, requires only modeling the pass/fail outcome for the device under test. This is far simpler than predicting a test’s actual measurement value using regression or some other technique. In other words, go/no-go testing is a binary classification problem, so it’s best to use statistical learning based on binary classification to predict the corresponding test’s pass/fail outcome. References 1. A. Chatterjee and R. Voorakaranam, “Test Generation for Accurate Prediction of Analog Specifications,” Proc. 18th IEEE VLSI Test Symp. (VTS 00), IEEE CS Press, 2000, pp. 137-142. 2. V. Natarajan, S. Bhattacharya, and A. Chatterjee, “Alternate Electrical Test for Extracting Mechanical Parameters of MEMS Accelerometer Sensors,” Proc. 24th IEEE VLSI Test Symp. (VTS 06), IEEE Press, 2006, pp. 665-673. 3. L.-C. Wang et al., “On Path-Based Learning and Its Applications in Delay Test and Diagnosis,” Proc. 41st Design Automation Conf. (DAC 04), ACM Press, 2004, pp. 492-497. 4. J.B. Brockman and S.W. Director, “Predictive Subset Testing: Optimizing IC Parametric Performance Testing for Quality, Cost, and Yield,” IEEE Trans. Semiconductor Manufacturing, vol. 2, no. 3, Aug. 1989, pp. 104-113. 5. L. Milor and A.L. Sangiovanni-Vincentelli, “Minimizing Production Test Time to Detect Faults in Analog Circuits,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 13, no. 6, June 1994, pp. 796-813. Proposed methodology The n tests in Tkept describe an ndimensional hyperspace. Test response data for each fabricated instance or part is a data point in the Tkept hyperspace; all the test data together form a distribution November–December 2006 in that hyperspace. Parts that pass each test ti ∈ Tred within the Tkept hyperspace describe passing subspaces, whereas the failing parts represent failing subspaces. 453 Process Variation and Stochastic Design and Test Our goal in deriving Fi(Tkept) is to separate these two types of subspaces using our BDT-based statistical learning method. The test response data we use to derive this BDT representation of Fi(Tkept) is called the training data. However, the training data might not reflect future data, because of inaccuracies from measurement errors, a lack of adequate training data, defects, and so on. This means Fi(Tkept) could achieve 100% accurate predictions for the training data yet exhibit high misprediction for future data. To remedy this shortcoming, we use another set of collected data, the validation data, to check the model pass/fail prediction accuracy. We measure the prediction accuracy by determining the fraction of passing or failing parts (from both the validation and training data) mispredicted by Fi(Tkept). We then use this accuracy to determine whether the tests in Tred are indeed redundant. In other words, if the misprediction error of Fi(Tkept) is lower than a preset threshold, the choice of Tred is deemed correct. On the other hand, if the misprediction error is higher than the preset threshold, the tests in Tred are not redundant, meaning we must redefine Tred or abort the test compaction process altogether. Binary decision trees We adapted the terminology used in this section from Rokach and Maimon.3 A BDT is a classifier expressed as a recursive partitioning of the Tkept hyperspace. Each nonterminal vertex in the decision tree represents a partition, which is an (n–1)-dimensional hyperplane that separates the Tkept hyperspace into two subspaces at a particular test measurement value vi,j for a test ti ∈ Tkept. Test response value vi,j is called a decision value, and the corresponding vertex in the tree is called a decision vertex. The subspace of parts resulting from the partition with measured values less than vi,j is the left-child vertex; the subspace of parts with values greater than or equal to vi,j is the right-child vertex. Each test ti ∈ Tkept can have several decision values {vi,1, vi,2, …, vi,m}, meaning a dimension can have multiple partitions. Each terminal vertex in a decision tree is ideally a homogeneous hypercube containing only passing or failing parts. Decision tree construction uses characteristics from a sample of manufactured parts for selecting a test ti ∈ Tkept to partition the hyperspace. The partitioning of the hyperspace with various test measurement values continues until all parts in the sample are partitioned (classified) into homogeneous, leaf-level vertices for all the tests in Tred. Once the tree derivation algorithm has obtained the BDT representation of Fi(Tkept), we deter- 454 mine the prediction of a future chip as pass or fail simply by traversing the tree from its root vertex to a terminal vertex according to the chip’s measurements for the tests in Tkept. Deriving a BDT representation of Fi(Tkept) is a two-step process, involving tree construction and tree pruning. During depth-first tree construction, the tree construction algorithm evaluates the capability of test measurement value vi,j to homogenize the subspace. The algorithm examines the homogeneity of a test and a measurement value vi,j as a candidate partition if the current vertex has at least two parts m and n (one passing and one failing) with measurement values ρi,m and ρi,n, respectively, such that ρi,m ≤ vi,j ≤ ρi,n or ρi,n ≤ vi,j ≤ ρi,m is true. From the test measurement values evaluated, the tree construction algorithm selects, as the vertex’s decision value, the vi,j that maximizes the separation of passing and failing parts in the subspace. The depth-first construction process continues until all training parts are homogeneously partitioned or until further partitioning of the Tkept hyperspace does not improve accuracy. We gauge the derived decision tree model’s prediction accuracy by observing the prediction error of Fi(Tkept) in the validation data. We can further improve the prediction accuracy by “pruning” the tree on the basis of how it classifies the validation data. During a breadth-first traversal that begins with the n – 1 level, the tree-pruning algorithm analyzes each vertex to determine if eliminating the partition and converting the corresponding left- and right-child trees into a single passing or failing hypercube would improve the prediction accuracy. For cases in which the tree-pruning algorithm determines that the prediction accuracy would improve, the algorithm replaces the decision vertex and all its descendants with a single passing or failing terminal vertex, depending on the one that leads to the greatest improvement. This is called the decision node deletion. In addition, the algorithm analyzes each decision vertex to determine whether a small perturbation of the decision value vi,j can improve the prediction accuracy; if so, the algorithm employs this perturbation. We call this decision value shifting. These two tree-pruning operations, decision node deletion and decision value shifting, together can lead to a more accurate Fi(Tkept). Figure 1a shows an example data distribution for a Tkept = {t1, t2} hyperspace. The lines in Figure 1a partially separate passing and failing parts; the dotted line shows the last partition chosen. Figure 1b shows the resulting, partially formed decision tree. The shaded region indicates the tree vertices from the dotted-line partition in IEEE Design & Test of Computers Measurement t2 Measurement t2 A b1 B C a1 a3 a2 Measurement t1 Measurement t1 (a) (a) t2 = b1 t1 = a1 Left t1 = a3 t1 = a2 Left Left Right Measurement t2 A B Right C Measurement t1 (b) (b) Figure 1. An example of a data distribution partially separated in the kept test Tkept = {t1, t2}, Figure 2. Possible misclassification of a future hyperspace (a), and the corresponding partially failing part (circled triangle) due to inadequate formed binary decision tree (BDT) (b). Circles coverage of failing subspaces by the training represent parts that pass Tred; triangles data (a), and the collapsing of passing represent parts that fail Tred. Dotted line in (a) hypercubes (shaded subspaces) to eliminate denotes the last partition chosen in the decision this error (b). tree derivation process. Shaded region in (b) represents new vertices added to the decision tree by the last partition. Figure 1a. The subspace corresponding to the new left child contains only passing parts; thus, the child is a passing terminal vertex of the tree. However, the subspace corresponding to the new right child contains both passing and failing parts and hence requires further partitioning. Therefore, the right child is a nonterminal vertex in the tree. Hypercube collapsing The decision tree representation of Fi(Tkept) is simply the partition of the Tkept hyperspace into hypercubes that separate the passing and failing subspaces. However, training data collected from a high- or low-yielding fabrication process can result in an insufficient number of November–December 2006 training parts from one class. Consequently, some partitions necessary to achieve an acceptable level of prediction accuracy might be absent. As a result, some portions of the passing or failing subspaces could be erroneously included in hypercubes of the opposite kind, leading to misclassification of a future data point (a manufactured part) falling into that hyperspace. We are particularly interested in defective-part misprediction when the training data is from a high-yielding process and when portions of the failing subspaces are likely incorrectly classified as passing. Figure 2a shows an artificial collection of training data for a Tkept = {t1, t2} hyperspace with high yield. The shaded regions represent the passing hypercubes in the Tkept hyperspace. The circled triangle represents a failing part that was misclassified because the training data did not sufficiently cover the failing subspaces. We have observed that for a sufficiently large sample 455 Process Variation and Stochastic Design and Test Measurement tb Predicted pass defect escape tb specification boundary Pass/fail decision boundary on ta (a) Measurement ta Measurement tb Guard-band device BDT model II Guard-band perturbations BDT model I tb specification boundary Pass/fail decision boundaries on ta due to guard banding I sponding to the partitions along the path. For a passing terminal vertex, the encountered decision values define the bounds of the hypercube associated with the vertex. Any test not included in the path from the root to the terminal vertex manifests itself as a missing bound that reduces the corresponding hypercube’s dimensionality and unnecessarily increases its size. Adding partitions for the missing tests on the basis of which failing parts are present in the hypercube will collapse it. However, the absence of failing parts in the vertex makes identifying decision values for these missing tests impossible. Therefore, instead, the algorithm derives decision values for the missing tests from the passing data in the vertex. For the worst-case scenario, with n tests in Tkept and k passing parts in the collected data, hypercube collapsing creates n × k additional decision values. Derivation of the decision tree model with hypercube collapsing is still polynomial, but with complexity O{(n2 × k3) + (n × k)}. Also, the increased complexity of using the tree with the collapsed hypercubes is at most n – 1. II Specification guard banding Measurement ta (b) Figure 3. Failing part misprediction due to overlap of passing and failing parts when the test response data is projected to the Tkept = {ta} dimension (a), and elimination of the misprediction using our two-model approach of specification guard banding (b). of high-yielding training data, passing parts in the training data adequately cover the passing subspaces. Therefore, to guard against the expensive scenario of mispredicting a defective part, we assume that the portions of the passing hypercubes that don’t include any passing part are failing subspaces. So, we “collapse” the passing hypercube boundaries to coincide with the passing-part data in the hypercube, and we denote any remaining subspace as failing. We will then classify all future parts residing in any of these new failing subspaces as defective. These parts cannot contribute to the total number of defective parts being mispredicted, but they can lead to an increase in yield loss, meaning a passing part will be mispredicted as bad. Figure 2b shows the result of collapsing the passing hypercube bounds from Figure 2a. The mispredicted failing part is now correctly classified. Our algorithm performs hypercube collapsing as follows. Each path from a decision tree’s root vertex to a terminal vertex includes several decision values corre- 456 Because of several limiting factors that include, for example, finite training data and errors in test response measurements, Fi(Tkept) will likely misclassify some future fabricated parts. Although completely eliminating the classification error is impractical, we hope to further reduce it by guard-banding the specification boundaries for each test ti ∈ Tred. We formulate guard bands as follows. We reevaluate each training part’s pass/fail attribute after perturbing the specified range of measurement values deemed passing for each test ti ∈ Tred to obtain two new sets of training data. We derive the first set by reducing the acceptable range of measurement values for each test ti ∈ Tred, possibly causing us to treat some passing parts as failing in this data set. We obtain the second data set by expanding the acceptable range for each test ti ∈ Tred, in this case possibly leading us to treat some failing parts as passing in this data set. We then construct two separate decision trees for these two training data sets, and together they constitute Fi(Tkept). We subject each new part to both decision trees for prediction. If the two models classify a part identically, we place higher confidence in the decision and accordingly classify the part as passing or failing. On the other hand, if the models disagree, the prediction uncertainty leads to a guard-band classification—that is, we conclude that we cannot make a high-confidence pass/fail prediction for the part solely on the basis of Tkept measurements. Figure 3a shows the possible misprediction error due to the overlap of the pro- IEEE Design & Test of Computers Measurement t2 jections of passing and failing parts onto the Tkept = {ta} hyperspace. Figure 3b demonstrates how guard banding eliminates this misclassification error by placing these overlapping parts into a third, guard-band class. Depending on the number of expected devices in the guard-band subspace and the cost, we can test guard-band devices further to determine their true pass/fail status.8 Alternately, we might decide that the devices in the guard-band subspace are good, bad, or even of lower grade, depending on the application’s quality requirements. A (a) Measurement t1 The collected test measurements for Tkept, though acceptable in terms of the specifications, can include some anomalies. For example, measurements from one wafer to another or from one lot to another can “drift” (that is, measurements from all parts in one wafer or lot might be shifted by the same value with respect to equivalent parts in some other wafer or lot). In addition, there could be outlier parts, which are acceptable according to the specifications but are identifiably different from the rest of the population. Several factors could be contributing to these anomalies, including a part’s location on a wafer, its wafer location within a lot, its fabrication time, the test equipment used to obtain its measurements, the presence of defects, or any combination of these. These anomalies typically manifest themselves in the Tkept hyperspace as passing or failing parts in subspaces predominantly covered by parts of the opposite kind. Outlying and drifting parts can therefore lead to an erroneous derivation of Fi(Tkept), which in turn can cause a higher misprediction rate, especially for failing parts. Therefore, it’s important to “scrub” the collected test measurement data to eliminate these anomalies. We propose three scrubbing steps: ■ ■ ■ principal component analysis (PCA), outlier elimination, and drift reduction. Principal component analysis For the data we examined, we observed that combinations of Tkept measurements can sometimes show increased correlation to yi, the pass/fail outcomes of a test ti ∈ Tred. Using combinations of Tkept measurements for decision tree modeling, therefore, could lead to more accurate subspace partitions and might be better suited for deriving Fi(Tkept). We use PCA to identify combinations of dimensions (tests).9 PCA derives relevant, linear com- November–December 2006 Measurement t´2 Data scrubbing (b) Measurement t´1 Figure 4. Example training data distribution for passing hypercubes with a large amount of white space (cube A), which can lead to misprediction of future failing parts (circled triangle in cube A) (a); and elimination of white space using principal component analysis (PCA) to derive a new Tkept hyperspace (b). binations of tests from Tkept using the training data to describe an orthonormal Tkept hyperspace. In other words, PCA potentially realigns the training data distribution along the derived orthonormal axes in the Tkept hyperspace. This is beneficial for decision tree modeling because typical partitions in a decision tree are orthogonal to a measurement axis in the Tkept hyperspace. When we apply decision tree partitioning to the orthonormal hyperspace derived using PCA, the hypercube boundaries are usually parallel to the population spreads. So, the resulting hypercubes might now contain a significantly reduced amount of white space. As with hypercube collapsing, we believe that these white spaces probably belong to failing subspaces and therefore should not be in passing hypercubes. According to this rationale, Fi(Tkept) obtained from a PCA-based orthonormal description of Tkept hyperspace would probably be more accurate in predicting future pass/fail outcomes. Figure 4a shows an example of a training data distri- 457 Process Variation and Stochastic Design and Test Measurement t2 Outliers leading to spurious passing hypercubes Measurement t1 Measurement t2 (a) (b) Measurement t1 Figure 5. Part distribution example showing the presence of outliers leading to spurious passing hypercubes in failing subspaces (a), and bution but still satisfy all specifications. As mentioned earlier, the presence of outliers can corrupt the derivation of Fi(Tkept). Specifically, an outlier that passes all tests can reside in a subspace that should be classified as failing in order to minimize prediction error. What results, however, because of one or more outliers, is the creation of a passing hypercube within a failing subspace. Because outliers are inherently unlikely, the passing hypercube will lead to the misprediction of future failing parts. We classify a passing datum as an outlier on the basis of its Euclidean distance from its nearest passing neighbors. If this distance is greater than a preset threshold, we identify the part as an outlier and eliminate it from the training data. Figure 5a shows a part distribution example with circles and triangles representing passing and failing parts, respectively. The decision tree partitioning of the distribution in Figure 5a results in spurious passing hypercubes (shown as shaded areas) in the failing subspaces because of the presence of outlying passing parts. The decision tree model derived from the part distribution shown in Figure 5a will therefore mispredict future failing parts residing in any of these failing subspaces. Eliminating these outlying passing parts removes these erroneous hypercubes from the decision tree model and results in a more accurate Fi(Tkept). Figure 5b illustrates how removal of these outlying passing parts eliminates the spurious passing hypercubes. elimination of outliers leading to removal of these spurious hypercubes (b). bution for a Tkept hyperspace consisting of two tests, t1 and t2. Once again, the circles represent passing parts, and the triangles represent failing parts. Figure 4a also shows a possible partitioning of the hyperspace. Passing hypercube A has a significant amount of white space that contains no training part. As a result, the decision tree model will erroneously classify as passing a future failing part (shown as a circled triangle) residing in this empty region. Figure 4b illustrates the partitioning of the same data distribution in a Tkept = {t′1, t′2 } hyperspace derived using PCA. The resulting passing hypercubes are minimal and contain very little white space. With this new partitioning, the decision tree model correctly identifies the future failing part. Outlier elimination Outliers are parts in the Tkept hyperspace with test measurements that significantly deviate from the distri- 458 Drift reduction Another potential source of error in the derivation of Fi(Tkept) is wafer or lot drift, in which the part distribution’s location in the Tkept hyperspace shifts from wafer to wafer or lot to lot. Consequently, different wafer or lot distributions that ideally should have overlapped to describe homogeneous (passing or failing) subspaces are now skewed because of drift, resulting in erroneous overlap of passing and failing parts. Thus, drift can cause misclassification of future parts. Figure 6a shows distributions for two example groups of test data, indicated by solid and hollow markers. Again, circles in each group represent passing parts, and triangles represent failing parts. The shaded areas show overlap of passing and failing hypercubes of trees derived from the two groups of data. Therefore, a decision tree model derived from one group will make mispredictions with respect to the other group. We minimize drift by measuring the Euclidean distance between the two distributions’ medians (a representation of the distribution centers) and then IEEE Design & Test of Computers Test compaction experiment We conducted a test compaction experiment using a commercially available MEMS accelerometer11—an integrated transducer that converts acceleration to an electrical signal. The testing process for an accelerometer involves not only electrical tests but also mechanical tests (that is, tests aimed at evaluating the accelerometer’s mechanical components) at room, cold, and hot temperatures.12 These mechanical tests are far more expensive than their all-electrical counterparts. Therefore, we can avoid significant cost if we can predict the outcomes of these mechanical tests using only the electrical-test measurements. We applied our statistical test compaction methodology to test measurements for more than 70,000 accelerometers. Test engineers collected these measurements at various times over a nine-month period. We investigated the possible elimination of the mechanical tests at both cold and hot temperatures. We used 80% of the accelerometers for training, and all of them for validation. Using fivefold cross validation, we measured the prediction accuracy after eliminating each mechanical test. For this type of validation, we derived five different training data sets from the test measurements. Any pair of training data sets differed from each other for 20% of the parts. We then used each training data set to derive a decision tree model, which we eval- November–December 2006 Measurement t2 Drift between the two groups of test data Subspaces with mispredictions Measurement t1 (a) Measurement t2 subtracting this value from the test measurements of each part in the distribution being predicted. This is simply the linear translation of a future distribution to the one used to derive the decision tree. Using a median as a central measurement for a part distribution reduces sensitivity to outliers. Figure 6b shows the result of drift reduction for the two distributions in Figure 6a. Specifically, we use the difference between the two groups’ medians to shift the solid-marked data onto the hollow-marked data. This shift eliminates the overlap of passing and failing parts from the two distributions and thus improves the prediction accuracy of the decision tree model derived from the hollow-marked data. Applying these data-scrubbing techniques to a commercial chip, however, will likely require several modifications to the standard test flow. For example, pass/fail prediction for a fabricated part is possible only after collecting the test measurements from all chips in a lot or wafer. We are currently exploring the issues related to making data scrubbing feasible in an industry test environment, and the work of Turakhia et al. could provide a viable solution.10 Measurement t1 (b) Figure 6. Drift between the distributions of two example groups of test data with Tkept = {t1, t2} (a), and elimination of the misprediction error through drift reduction (b). Shaded regions in (a) show places where parts are mispredicted. uated for the amount of misprediction upon elimination of a test. The average level of misprediction for the five training data sets characterized the classifier’s quality for the possible elimination of the test. We observed that the test measurement data used in this experiment included outliers and lot-to-lot drift, so we scrubbed the data to eliminate the effect of drift. In addition, we treated the training data to remove passing outliers. Finally, we used guard banding to reduce misprediction for tests whose elimination resulted in unacceptable errors. Using custom software developed at Carnegie Mellon University, we constructed a tree in just a few hours, using a data set of more than 50,000 parts. Using this tree for classification took only a few seconds for all of the tens of thousands of parts predicted. 459 Process Variation and Stochastic Design and Test Table 1. Fivefold cross-validation percentages of passing Table 2. Fivefold cross-validation percentages of passing and failing parts mispredicted and their averages when and failing parts mispredicted and their averages when we eliminate the cold mechanical test. we eliminate the hot mechanical test. Failing parts Passing parts Run mispredicted (%) mispredicted (%) Run 1 0.00 0.06 1 9.34 0.72 2 1.54 0.08 2 8.79 0.65 3 0.30 0.06 3 4.40 0.75 4 0.00 0.07 4 7.69 0.71 5 0.00 0.07 5 4.40 0.82 0.36 0.07 6.92 0.73 Average Failing parts Passing parts mispredicted (%) mispredicted (%) Average the hot mechanical test. When we eliminated the 56.1% 0.89% cold mechanical test, Cold 34.6% the average misprediction Hot error was only 0.36% for 98 failing parts and 0.07% 16.3% for passing parts; when we eliminated the hot mechanical test, the resulting error was 6.92% and 96 0.73%, respectively, for the failing and passing parts. Although the percent4.3% ages of mispredictions for 94 the mechanical test outcomes are extremely low, we can use specification guard banding to further 92 0 10 20 30 40 50 60 70 80 reduce these values. In fact, we can easily achieve Guard-banded parts (%) zero defect escape for the cold mechanical test. Figure 7. Worst-case receiver operating characteristic (ROC) plots of percentages of Figure 7 shows receiver failing parts correctly identified versus percentage of guard-banded parts for a operating characteristic commercial accelerometer when cold (solid line) and hot (dotted line) (ROC) plots for the worstmicromechanical tests are eliminated. case fivefold cross-validation runs for cold (solid We conducted separate test compaction experiments line) and hot (dotted line) mechanical tests. Specifically, to analyze the possible elimination of the mechanical tests Figure 7 plots the percentages of correct predictions for at both cold (–40°C) and hot (80°C) temperatures. We failing parts against the percentage of parts placed in the included all the electrical tests at each temperature in Tkept guard band. (Note that the percentages reported along to derive the decision tree model for the outcome of the the y-axis in Figure 7 are not the same as DPM—number mechanical test at that temperature. Table 1 gives the five- of defective parts per million units shipped—but rather fold cross-validation prediction results when we elimi- they are the percentages of failing parts mispredicted.) nated the cold mechanical test. Table 2 gives the fivefold These plots demonstrate that it’s possible to improve the cross-validation prediction results when we eliminated accuracy of Fi(Tkept). For example, we could probably 8.1% 78.4% Failing parts correctly identified (%) 100 460 IEEE Design & Test of Computers eliminate the cold mechanical test, because few parts require guard banding to ensure that no defective parts escape. To illustrate the potential cost savings from using our test compaction methodology, we developed a simple model for the reduction in test cost per shipped part, ΔC: work, and we thank Freescale employees Teresa Maudie, Rick Neilsen, Ray Roop, and Brooks Scofield for their insightful description of the test data. References 1. V.N. Vapnik, Statistical Learning Theory, WileyInterScience, New York, 1998. ΔC = (CE + CM)/Y – [CE + (CDE × DE) + (CM × GB)]/(Y – ΔY) 2. A.K. Jain, J. Mao, and K.M. Mohiuddin, “Artificial Neural Networks: A Tutorial,” IEEE J. Computers, vol. 29, no. 3, Mar. 1996, pp. 31-44. where CE and CM are the cost of applying the electrical and mechanical tests to each manufactured part, CDE is the cost of mispredicting a defective part, and Y is the process yield. ΔY and DE are the fractions of good and failing parts, respectively, that are mispredicted because of the elimination of a mechanical test, and GB is the fraction of parts guard-banded (meaning they require application of the mechanical test). Assuming Y = 90%, CM = $5, CE = $1, and CDE = $1,000, the model simplifies to 3. L. Rokach and O. Maimon, “Top-Down Induction of Decision Trees Classifiers—A Survey,” IEEE Trans. Systems, Man, and Cybernetics—Part C: Applications and Reviews, vol. 35, no. 4, Nov. 2005, pp. 476-487. 4. S. Biswas et al., “Specification Test Compaction for Analog Circuits and MEMS,” Proc. Design, Automation and Test in Europe Conf. (DATE 05), IEEE CS Press, 2005, pp. 164-169. 5. H.G.D. Stratigopoulos and Y. Makris, “Bridging the Accuracy of Functional and Machine-Learning-Based Mixed- ΔC = 6.67 – [1 + (1000 × DE) + (5 × GB)]/(0.9 – ΔY) Signal Testing,” Proc. IEEE VLSI Test Symp. (VTS 06), IEEE CS Press, 2006, pp. 395-400. Substituting DE = 0.36% and ΔY = 0.07% from Figure 7, the cost reduction from eliminating the cold mechanical test with no guard banding is $5.16 per shipped part. If, however, no defect escape is tolerable, we must guard-band 8.1% of the manufactured parts. Assuming we apply both the cold mechanical and electrical tests to these guard-banded parts, the cost reduction is still significant: $5.11 per shipped part. Either way, we eliminate more than 76% of the cold test’s cost. 6. B. DasGupta, H.T. Siegelman, and E. Sontag, “On the Complexity of Training Neural Networks with Continuous Activation Functions,” IEEE Trans. Neural Networks, vol. 6, no. 6, Nov. 1995, pp. 1490-1504. 7. J.K. Martin and D.S. Hirschberg, The Time Complexity of Decision Tree Induction, tech. report 95-27, Dept. of Information and Computer Science, Univ. of California, Irvine, 1995. 8. R. Voorakaranam et al., “Production Deployment of a Fast Transient Testing Methodology for Analog Circuits: Case Study and Results,” Proc. Int’l Test Conf. (ITC 03), THUS, OUR PROPOSED METHODOLOGY can eliminate an expensive mechanical test for a commercially available accelerometer with little error. Moreover, it’s possible to completely eliminate the error (for failing parts) using specification guard banding. But the same result could not be achieved for the equivalent mechanical test executed at an elevated temperature. Techniques such as specification guard banding and drift removal can reduce error, but more research is needed. More importantly, techniques are needed for incorporating this and similar methodologies into a production test flow. ■ IEEE CS Press, 2003, pp. 1174-1181. 9. M.E. Wall, A. Rechsteiner, and L.M. Rocha, “Singular Value Decomposition and Principal Component Analysis,” A Practical Approach to Microarray Data Analysis, D.P. Berrar, W. Dubitzky, and M. Granzow, eds., Kluwer Academic Publishers, 2003, pp. 91-109. 10. R.P. Turakhia et al., “Changing Test and Data Modeling Requirements for Screening Latent Defects as Statistical Outliers,” IEEE Design & Test, vol. 23, no. 2, Mar.-Apr. 2006, pp. 100-109. 11. S.D. Senturia, “A Capacitive Accelerometer,” Microsystem Design, Kluwer Academic Publishers, 2003, pp. 497-530. Acknowledgments We acknowledge the use of Freescale Semiconductor accelerometer test data in this research November–December 2006 12. T. Maudie et al., “MEMS Manufacturing Testing: An Accelerometer Case Study,” Proc. Int’l Test Conf. (ITC 03), IEEE CS Press, 2003, pp. 843-849. 461 Process Variation and Stochastic Design and Test F E AT U R I N G IN 2007 • Healthcare • Mining a Sensor-Rich World • Urban Computing • Security & Privacy IEEE Pervasive Computing delivers the latest peer-reviewed developments in pervasive, mobile, and ubiquitous computing to developers, researchers, and educators who want to keep abreast of rapid technology change. With content that’s accessible and useful today, this publication acts as a catalyst for progress in this emerging field, bringing Sounil Biswas is a PhD candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University. His research interests include test of integrated, heterogeneous systems; statistical analysis of test data; and defect characterization. Biswas has a BTech in electrical engineering from the Indian Institute of Technology, Kanpur, and an MS in electrical and computer engineering from Carnegie Mellon University. Ronald D. (Shawn) Blanton is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University, where he is an associate director of the Center for Silicon System Implementation (CSSI). His research interests include test and diagnosis of integrated, heterogeneous systems. Blanton has a BS in engineering from Calvin College, an MS in electrical engineering from the University of Arizona, and a PhD in computer science and engineering from the University of Michigan, Ann Arbor. He is a member of the ACM and a senior member of the IEEE. together the leading experts in such areas as • Hardware technologies • Software infrastructure • Sensing and interaction with the physical world Subscribe Now! Direct questions and comments about this article to Shawn Blanton, 2109 Hamerschlag Hall, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15217; blanton@ece.cmu.edu. • Graceful integration of human users • Systems considerations, including scalability, security, and privacy For further information on this or any other computing topic, visit our Digital Library at http://www.computer.org/ publications/dlib. VISIT www.computer.org/pervasive/subscribe.htm 462 IEEE Design & Test of Computers