2001 Systems Engineering Capstone Conference • University of Virginia AN AUTOMATED DEFECT ANALYSIS SYSTEM (D.O.U.G.I.E.) FOR DOMINION SEMICONDUCTOR Student team: Saiful Amin, Tim Bagnall, Jeff Binggeli, Jeff Bridgman Faculty Advisors: Prof. Christina Mastrangelo and K. P. White, Department of Systems Engineering Client Advisors: Rick Newcomer and Mark Spinelli Dominion Semiconductor Manassas, VA E-mail: rnewcome@dominionsc.com, KEYWORDS: Semiconductor manufacturing, defect analysis, clustering, data filtration ABSTRACT The goal of this project was to help Dominion Semiconductor (DSC) automate its defect pattern recognition and classification process in order to help increase yield. The current system used by DSC is based on experience and intuition, while the system being developed by the Capstone team is an analytical system for use in an automated environment. The system developed by the team involves three phases: test of randomness, data filtration, and cluster analysis. A number of algorithms were evaluated for each phase. For Phase 1, the Holgate N and Skellam-Moore spatial statistical tests have demonstrated the greatest potential for testing for randomness. A ‘mode-seeking’ algorithm proved to be the best at filtering out random defects. Of the cluster analysis algorithms evaluated in this project, the single-linkage approach was the most accurate. A system that is automated and fully integrated would benefit DSC by increasing defect identification efficiency that would increase die/sort yield and revenue. INTRODUCTION The process of developing a complete silicon wafer with a set of chips is extremely complicated. On average, Dominion completes the three-month production process of about 1000 wafers a day with close to 300 chips (or die) on each wafer (Spinelli 2000). Furthermore, throughout the three-month duration of processing a chip, a wafer completes about 350 processing steps (Spinelli 2000). In such an exhaustive form of development that requires preciseness to mind boggling levels, lab environment purity is crucial. Cleanliness of this nature is due to the fact that the microscopic circuitry and processing chips are easily prone to destruction by a particle that is 1/10000 the diameter of a human hair (Van Zant 1997). In addition, countless chemicals and various forms of bacteria can also plague wafer production throughout processing. Die/sort yield is the percentage of functional die with respect to the total number of die initially contained on the wafer. This yield is the primary quantitative method for evaluating the value of a silicon wafer and the effectiveness of the production process. Dominion stores all of this information in their databases to calculate the yield of each new line of chips that they produce. Generally, at the initial stages of processor chip production, the die/sort yield hovers near the 60% – 70% (Spinelli 2000). Therefore, at the beginning of a product line, silicon wafer production is extremely inefficient because Dominion is losing almost a third of their product. With each wafer potentially worth $20,000 $40,000, this translates into a product with the capability of becoming a profitable one given superior output of wafers. Increasing the value of die/sort yield would allow Dominion to sell their wafers at a higher market value that would then lead to increased profits. Currently, manual inspection of wafers is the primary method of inspection of defective die on silicon wafers. An individual at Dominion reviews one lot of picmaps (Fig. 1), a visual representation of the defective die on a wafer, and singles out noticeable errors for later inspection and study. Although this process has proven to be relatively successful, Figure 1 - Picmap there are opportunities for improvement in this 19 An Automated Defect Analysis System (D.O.U.G.I.E.) process. First, manual inspection includes the inherent aspect of human error, which in turn could lead to false classification or lack of classification of defect clusters. The second problem with the existing process is that not all of the wafers with low yield are inspected. Thus, certain reoccurring errors might be overlooked. This project improves this manual process of error detection by attempting to automate the entire process of defect detection and classification. The remaining portion of this document discusses the methodology and three-phases of our approach to improving this process. AUTOMATED SYSTEM DESIGN system. Of the five tests of randomness, no method was able to effectively distinguish between wafers with random defects and wafers with clusters of defects. However, after an examination of data produced during evaluation of the tests of randomness, it appeared that a few of the methods would perform at some level of accuracy if the statistical thresholds were replaced with empirical limits – thresholds chosen by the user as a function of observing acceptable limits of defects. The use of empirical limits proved to be advantageous, especially when applying the Skellam-Moore and Holgate N tests to wafers. These two tests of randomness are revisited during discussion of the evaluation of our automated system. Phase One: Screening for Randomness Phase Two: Data Filtration The purpose of Phase 1 is to serve as a screening process. In a typical group of wafers, some of the wafers are going to exhibit obvious clusters of defects, and other wafers will have defects that appear random in nature. While the wafers with clusters and other non-random patterns of defects can aid Dominion in pinpointing manufacturing problems, the wafers with random defects are of not much use in quality control. Therefore, ideally, the automated system should only analyze the wafers that do indeed have defect clusters. . Then, only the wafers with defect clusters would advance on to Phases 2 and 3 of the automated system. The screening process increases the efficiency of the automated system, as the substantial computational resources involved in cluster analysis will not be expended on the wafers determined to contain random, unusable information. Phase 1 is comprised of a test of randomness, a statistical test that determines whether or not points in an area are spatially random. Through research, several tests of randomness were identified – the SkellamMoore, Hopkins F, Hopkins N, Holgate F, and Holgate N tests (Ripley 1981). Each test has a statistic that is calculated, based on a mathematical analysis of the spatial area. Random samples of points on the wafer are used in the calculations; therefore, the statistic can take on a range of values for any specific wafer. This statistic is compared to a threshold value, which is dependent on several factors, such as the significance level and statistical distribution involved. Determining if randomness exists on a wafer depends on whether or not the calculated statistic is above or below the threshold. After initial testing, it was discovered that none of the tests of randomness performed in a manner that would be beneficial to Phase 1 of our automated After removing wafers that primarily consist of random defects, this next phase involves filtering out irrelevant data in the current set of wafers. The purpose of the valley-seeking technique and the mode-seeking procedure is to provide a clearer picmaps. In addition to being visually clear, the two methods will also focus on maintaining relevant data so that cluster analysis 20 Valley-seeking Algorithm: The valley-seeking algorithm, with the term valley referring to the parts of the wafer with a low frequency of defects, will more or less be involved with removing the isolated data points that may be found on a wafer. In theory, the way this approach XG XG XG works is that a non-working die (XB) will be selected from a picmap and XG X XG an inspection will be done of the surrounding eight die. The adjacent XB XB XB die will have their status checked and then either classified as a Figure 2 – 3x3 Grid good die (XG) or bad die (XB) as seen in Figure 2 (Diday 1994). Based on the number of XB and XG variables and their location, the algorithm will evaluate whether or not the original die X needs to be eliminated (note: X will always be a bad die because the algorithm only inspects dysfunctional die). For the most part, the valley-seeking technique will throw away the data if there is no connectivity with XB points (Diday 1994). An example of connectivity would be one or more adjacent XBs with X. 2001 Systems Engineering Capstone Conference • University of Virginia The valley-seeking method XG XG XG calls for the removal of die with XG X XG no connectivity, and this is analogous to removing a random XG XG XG defect – Figure 3. This allows for a clearer picture and aides the process of clustering by Figure 3 – 3x3 Grid minimizing the number of isolated data. Second, keeping die with at least one connection enables detection of scratches Mode-seeking Algorithm: The mode seeking searches for an X and then does a relationship check of the 3x3 frame that it is in. The location XG XG XG of Xwill only be preserved if and only if a good portion of XB X XB B the surrounding die are X and have a high level of XB XB XB connectivity (Diday 1994). This process helps to Figure 4 – 3x3 Grid establish the legitimacy of large clusters in the data set so that a visual inspection of the wafer will focus mainly on the major errors. Considering that the primary purpose of the modeseeking procedure is to focus on the more defined and larger clusters, the decision was made to approach the wafers classification with a minimum of two connections. One reason setting a reasonably high connection minimum would be good is because it would do away with a lot of the small clusters. In addition, clustering could tackle the larger clusters first so as to focus on the problems that are causing the loss of the greatest amount of die. Another reason for setting a reasonably high connection value is to simplify the image of the picmap so that it is also easier to comprehend visually. Phase Three: Cluster Analysis Once a wafer has been filtered, the defects will then be clustered using a cluster analysis algorithm. Cluster analysis is a pattern recognition technique used to separate data into clusters without prior knowledge of the composition of the clusters. The selected cluster analysis algorithm will transform the set of wafer defect points into a sequence of nested partitions by using measurements of the distances between the defects. Figure 5 shows a wafer that has been clustered. Notice that the defects at the top of the wafer were assigned to one cluster. Figure 5 – Clustered Wafer There are two main types of clustering algorithms, hierarchical and non-hierarchical. In hierarchical clustering each data point starts as a single cluster and these clusters are merged together as the algorithm progresses. When a hierarchical method assigns a data point to a cluster, that point is part of that cluster for good. A merge cannot be redone if it is clear at a later stage that other clusters would have merged in a better way. Non-hierarchical algorithms work in a similar fashion, however non-hierarchical methods allow clusters to change and evolve if it is found that a data point fits better in another cluster. Another difference is that in non-hierarchical clustering the number of clusters in the data must be known before starting. The clusters found using cluster analysis will be used to classify defect patterns into signatures. For example, the cluster found in Figure 5 would be classified as a top edge defect. The identification of a signature will allow DSC to fix the cause of the defect pattern. The clusters will also be used to identify new signatures. By using cluster analysis, smaller clusters may be discovered that would go unnoticed using the current system. If the same small cluster is found across many wafers or many lots, this could be a signal of a new signature. The identification and classification of signatures will help DSC improve yield over the lifetime of a product. EXPERIMENT DESIGN AND RESULTS After using Visual Basic to implement the three phases into an automated system named D.O.U.G.I.E, the next step was to evaluate the performance of the three phases on wafer data. Data Exploration There were two primary sets of data that were dealt with for this project. The first set of data came from Dominion Semiconductor and their retired line of the Gemini 128 Mb DRAM chips. Most of the data came from the production periods in the months from 21 An Automated Defect Analysis System (D.O.U.G.I.E.) September to December 1999. This was during the initial stages of the production the Gemini wafer, which means that many of the wafers were of low yield and filled with error. The data included attributes like lot number, wafer number, x and y-coordinates of errors, pass/fail value, and the type of failure to occur for each die. The second set of data was simulated with a specific collection of defect scenarios in mind. This artificial lot was comprised of fourteen wafers with random error and fifteen wafers with defined defects. Of the fifteen that were not random, three wafers were assigned to exhibit each of the five following well-known defect patterns: vertical scratch, diagonal scratch, center defect, outer radial defect, and edge defect. Figures 6 and 7 are examples of two of the wafers with simulated random defects, and figures 8 and 9 are examples of two of the wafers simulated with well-known defect clusters. Figure 6 – Random Defects Figure 7 – Random Defects Figure 8 – Center Defect Figure 9 – Radial Defect The primary purpose of the artificial lot was to test the effectiveness of our approaches on wafers where it was known exactly how our methods should behave when functioning properly. For example, in an experiment on a wafer with a center defect (Fig. [?]), Phase 1 should not screen out the wafer because the 22 defects are not random, Phase 2 should not filter out the defects that belong to the center cluster, and Phase 3 should assign all the defects in the center to one cluster. Phase One Evaluation The Skellam-Moore and Holgate N tests of randomness were the two methods that had the potential to be used to implement the screening process in Phase 1. Both the Skellam-Moore and Holgate N behave differently when varying the number of spatial samples included in the test. A sample is a point on the wafer that is chosen randomly, and each point is the basis for all mathematical computations involved in the test. Through testing, it was determined that the Holgate N was effective at one level of sampling – 60 samples. The Skellam-Moore had multiple levels of sampling that produced promising behavior. Three levels – 30, 60, and 90 samples – were examined in the final evaluation of the Skellam-Moore test. The application of the Holgate N and SkellamMoore tests to wafers presented two outcomes that were important to the evaluation of Phase 1, the screening process. One, the “good” result, when the wafer in question has random defects and the test correctly determines that randomness exists. This is important, because if few wafers with random defects are identified, the screening process becomes inefficient. Two, the “bad” result, when the wafer in question has clusters of defects, but the test incorrectly determines that randomness exists. This error is detrimental to Dominion’s quality control, as the screening process would discard wafers with important, non-random information. To evaluate the four methods, the Holgate N using 60 samples and Skellam-Moore using 30, 60, and 90 samples, they were applied to the artificial test lot and three of the lots with Gemini data. The test of randomness was applied twenty times to every wafer in the four lots, and the following information was recorded and then averaged (for the twenty trials): the number of wafers determined to have random defects – and of these wafers – the number of wafers known to have random defects (good results), and the number of wafers with known clusters of defects (bad results). From this data, the following performance metrics were calculated: the average percentage of wafers with random defects that are screened out (good), the average percentage of wafers with defect clusters that are screened out (bad), and the ratio of these good and bad percentages. 2001 Systems Engineering Capstone Conference • University of Virginia Evaluation on the Artificial Lot their classification of the wafers provided the basis upon which to compare to what the tests of randomness indicated. Like the experiment on the artificial test lot, which had wafers with scratches, the three Gemini lots had wafers with non-random patterns that would not be detected by the tests of randomness. Again, these wafers were not counted in the misclassification errors. Since we created the wafers in our artificial test lot, it was known exactly which wafers had random defects and which wafers had clusters. Therefore, evaluation was straightforward – Phase 1 should only screen out the wafers with random defects. However, not all of the non-random defects were strictly clusters; there were six wafers with scratch defects, which appear as lines instead of clusters. It was understood that the tests of randomness were designed to only be sensitive to clusters, meaning that scratches and other cluster-free patterns would possibly be indicated as being random. In evaluation, the wafers with scratches were accounted for and the performance metrics were adjusted accordingly. If a wafer with a scratch was determined to have random defects, this was not counted as an error. Test % of Wafers % of Wafers Good w/ Random Defects w/ Non-Random Defects to Screened Out Screened Out Bad (Good) (Bad) Ratio Holgate N, 60 Samples 24.43 12.33 1.98 Skellam-Moore, 30 Samples 26.67 21.57 1.24 0.93 Skellam-Moore, 60 Samples Skellam-Moore, 90 Samples 10.03 10.73 26.37 18.73 1.41 Figure 11 – Gemini Lots Results % of Wafers % of Wafers Good w/ Random Defects w/ Non-Random Defects to Screened Out Screened Out Bad (Good) (Bad) Ratio Holgate N, 60 Samples 32.9 1.1 29.91 Skellam-Moore, 30 Samples 52.5 5.6 9.38 Skellam-Moore, 60 Samples Skellam-Moore, 90 Samples 33.2 1.7 19.53 61.8 5 12.36 Test The results were not as strong as with the experiment on the artificial test lot. Even still, the methods still seemed that they could be beneficial in the Phase 1 screening process. For example, the Holgate N successfully screened out 24% of the wafers with random defects, while the error was contained to12% percent of the wafers with non-random defects. Out of ten wafers with clusters, only about one of those wafers would be incorrectly discarded – not an intolerable loss. Figure 10 – Artificial Test Lot Results Phase Two and Three Evaluation The results shown above were very positive. For each of the four tests, a range of 32 to 62 percent of the wafers with random defects were successfully screened out, while only a range of 1 to 6 percent of the wafers with defect clusters were incorrectly determined to be random. The Holgate N had the lowest amount of error and the best “good to bad” ratio. The Skellam-Moore tests had higher errors, but they successfully screened out a larger percentage of the wafers with random defects. Evaluation on the Gemini Lots For the lots of actual Dominion wafer data, there was more involved in knowing which wafers had random defects and which wafers had clusters. Without priori knowledge, as with the wafers of our own creation, we had to rely on Dominion’s interpretation of the wafers. Dominion is the expert when it comes to deciding if wafers have random defects or clusters, so Five hierarchical clustering algorithms were evaluated in this project. These algorithms were single linkage, complete linkage, average linkage, centroid method, and Ward’s method. These algorithms were run on fifteen simulated wafers without data filtration for an initial evaluation of the methods. The algorithms are to be run on the same simulated data after being filtered using the Mode seeking procedure as part of the testing of the integrated system. In addition, the algorithms will be run on real data before a final product will be handed over to DSC. A combination of a tool developed by the team called D.O.U.G.I.E. and Minitab were used to test each algorithm. The X/Y coordinates of each defect of a wafer were outputted from D.O.U.G.I.E. and inputted into Minitab to perform the clustering. As Minitab starts, each defect is an individual cluster. These clusters are then merged based on the distance criteria of the algorithm step by step until all of the defects are grouped into one cluster. 23 An Automated Defect Analysis System (D.O.U.G.I.E.) Results Single-Linkage Distance 0.77 0.51 0.26 1 2 6 3 17 27 13 18 9 4 5 7 8 10 11 12 14 15 16 19 20 21 23 24 25 28 29 30 31 33 34 35 38 39 40 42 43 44 47 48 49 52 53 54 56 57 58 59 61 37 41 46 50 45 55 60 22 26 32 36 51 0.00 Observations Figure 12 – Single Linkage Dendogram The initial testing on unfiltered data showed that single-linkage was the most accurate of the clustering algorithms tested. Single-linkage produced the least amount of type I errors for each defect cluster tested. In fact, only one type I error occurred using singlelinkage. At the same time, however, single-linkage produced greater amounts of type II errors than the other algorithms. The other algorithms (completelinkage, average-linkage, centroid method, and Ward’s method) had very similar results to each other with each having an unacceptable number of type I errors and very few type II errors. At the time this paper is being written, integrated testing using filtered data has not been completed, however it is assumed that these results will hold for filtered data. Stopping the Algorithm CONCLUSIONS Before D.O.U.G.I.E. could display the results of the clustering from Minitab, the number of clusters on a wafer had to determined. By reviewing a wafer, it becomes apparent how many clusters should result. For example, if a wafer contains one defect cluster, and ten random defects, then there should be roughly eleven clusters on the wafer. Because Minitab shows how many clusters result after a merge decision has been made, it provides a convenient means of stopping the algorithm where the amount of resulting clusters can be chosen. Thus, the algorithms were run for each clustering method until the amount of clusters specified for a given wafer was reached. In later versions of this system the stopping procedure will be an automated process based on set criteria such as the overall yield of the wafer. In our automated system, the three-phased approach demonstrates promising performance. Phase 1 and the Holgate N and Skellam-Moore tests were able to detect randomness on many of the wafers with random defects and, for the most part, screened them out without incorrectly including wafers with edge, center, and outer radial clusters. Phase 2 successfully filtered out the extraneous defects. The Mode Seeking Procedure was especially effective at removing most of the random defects and isolating the clusters on a wafer. Phase 3 has the capability of analyzing the defects and assigning them membership to a cluster. The Single Linkage method could consistently form the clusters that were expected to be identified. Phase 3 proved the advantages of the synthesis of all three phases, as the cluster analysis was more effective when performed after Phase 1 and Phase 2 were completed. It is recommended that Dominion Semiconductor continue to pursue the three-phase approach – screening for randomness, data filtration, and cluster analysis. Hopefully, in the future, the automated system can be developed to a point where it would be beneficial and its implementation in operations would increase die/sort yield and revenue. Evaluation Methods The clustering methods were evaluated on their performance of capturing all the defects that belong to the cluster scenario. In this process, I recorded two possibilities of error, type I and type II, with the clustering algorithm. Type I error: a defect belonging to the cluster that was not captured by the algorithm. Type II error: a defect not belonging to the cluster that was captured by the algorithm. It was decided that minimizing type I errors was more important than minimizing type II because the clustering algorithm must be able to identify as much of a defect cluster as possible to aid in the classification of that cluster. 24 REFERENCES Diday, Edwin. et al. (1994) New Approaches in Classification and Data Analysis. Springer-Verlag, New York. Everitt, Brian S.(1993) Cluster Analysis. Arnold, New York. 2001 Systems Engineering Capstone Conference • University of Virginia Ripley, Brian D. (1981) Spatial Statistics. Wiley, New York. Spinelli, Mark. Powerpoint presentation to the UVA Capstone team. 22 Sept. 2000. Van Zant, P. (1997) Microchip Fabrication, 3rd ed. McGraw-Hill, New York. BIOGRAPHIES Saiful Amin is a fourth-year Systems Engineering major at the University of Virginia from Springfield, VA. Within the Systems major, Saiful has a focus in Computer Information Systems. His primary role in this project was to develop the algorithms surrounding the filtration process and implement it. Saiful has accepted an IT consulting job with Accenture (formerly known as Andersen Consulting) and will begin in August. Tim Bagnall is a fourth-year Systems Engineering major from Springfield, VA. In addition to his Systems major, he has also received a minor in Economics. Tim’s primary role in the project was implementing and evaluating cluster analysis. Jeff Binggeli is a fourth-year Systems Engineering major at the University of Virginia. Jeff is a member of the varsity cross country and track and field teams at the University. In addition to Systems Engineering, Jeff is also working on a major in Economics. Jeff Bridgman is a fourth-year Systems Engineering major from Herndon, VA. His main responsibility in the project was to implement and evaluate the automated screening process that filters out the wafers that are free of clusters of defects. Jeff has accepted a position with Accenture and will begin in August. 25