Device Independent Process Control of Dielectric Chemical Mechanical Polishing by Taber Hardesty Smith Bachelor of Science, Rochester Institute of Technology, May 1994 Master of Science, Massachusetts Institute of Technology, June 1996 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 27th, 1999 ©Massachusetts Institute of Technology, 1999. All Rights Reserved. A u th o r .................................................................... Electrical Engineering and Computer Science September 27th, 1999 Certified by .............. Boning Duan( Associate rofessor Electrical Engineerin g and Computer Science Accepted by .............. MASSACHUSETTS INSTITUTE OFTECHNOLOGy FMAF LIBRARIES Arthur C. Smith Chairma n, Department Committee on Graduate Students Electrical Engineering and Computer Science I Device Independent Process Control of Dielectric Chemical Mechanical Polishing by Taber Hardesty Smith Submitted to the Department of Electrical Engineering and Computer Science on September 27, 1999, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The use of the chemical-mechanical polishing (CMP) process in the semiconductor industry is growing rapidly, and it is a critical step in the manufacturing of integrated circuits. The CMP process is complicated by many factors, and controlling all of these factors in a single controller has been unrealistic. One of the most significant factors complicating control is the dependency of the polishing process on the pattern layout of the particular device being polished. The interactions between these patterns and the polishing behavior of a CMP tool make monitoring and controlling the process particularly difficult. Current techniques focus on the control of a few sites on a single type of layout being polished on a single tool. We show that using only a few sites does not give insight into what is happening at sites that are not measured and controlled. Further, by restricting attention to a single device, these controllers address only pieces of a larger problem and fail to take into account the effects of different devices being polished on the same tool. This thesis outlines a comprehensive framework for controlling the polishing of multiple devices with arbitrary layout patterns being polished on a single CMP tool. We explore the use of an advanced CMP process model in conjunction with an on-line metrology system and a simple filtering algorithm for controlling the average post-polish thickness and monitoring the global non-uniformity of multiple devices being processed. This framework provides several benefits. First, it allows measurements from any device to be used to update the tool level model that can be used with any other device being processed. Second, it allows us to measure only a few points, while very accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of the polishing process for each device being processed. Experimental work shows that this approach results in very accurate control of the true average thickness of multiple devices, with a lot to lot variability of only 100A. In addition, we are able to very accurately predict the global non-uniformity of the polished wafers. However, the model used was found to have a minor dependence on the device type, indicating that an improved layout pattern functional model is necessary to achieve truly device independent control. We explore one possible model that might reduce these dependencies, and show this model provides a 50% improvement in fitting errors of both raised and down area thicknesses. Despite this, we find that device dependencies still exist with the improved model, and further work is necessary to make the model and control strategy completely device independent. 3 Acknowledgments I would like to thank my wife, Tina, for being the perfect complement of my technical persona, and for being the source of much joy in my life. I apologize for making her sit through many conversations bored to tears. She has been a huge supporter of my efforts and drive to continue. She has sacrificed a lot in her life to make this possible, and much of the credit goes to her. I would also like to thank my mother and father, Cheryl Hardesty and Dale Smith, for making me who I am today, and for always being so supportive of everything that I have done or ever wanted to do. My brother, Dirk, has been a role model of character and respect for me since birth. My brother Brett Smith has been a driver of my efforts in everything from soccer to studying all my life. My family thus owns much of the credit. I don't normally believe in luck, but one phone call five years ago is hard to argue with. I would like to thank my advisor, Professor Duane Boning, for calling, and subsequently encouraging every idea I have had, keeping me straight and barely under control, and having faith in me when I was drowning in pressure. He has been an inspiration for me, who never ceases to amaze me with his ability or character. He has been an outstanding advisor, who I would not have traded for anyone else. Our journey together has been a long one, yet it seemed to go by so fast. This is what happens when you are having fun, and working with Duane has been the best. Most of my experimental work was done in collaboration with Texas Instruments Inc., in Dallas, TX, where I spent nine months and a lot of their resources. I was very fortunate to meet a lot of outstanding people there. I would to thank Dr. Jerry Stefani for many things, including agreeing to co-advise my thesis. Jerry has been a great friend, mentor, and supporter throughout my Ph.D work. He made sure we had fun in Texas, and for that I have many fond memories. He has also been a great mentor, and has taught me a tremendous amount. He is amazing at pulling people and resources together, organizing work efforts, and championing what he believes in, so the credit for much of my work goes to him as well. I would also like to thank Dr. Simon Fang. Simon is one the funniest, hardworking, and intelligent people I have ever worked with. Much of the work in this thesis and other works were the result of one idea caught from the fountain of knowledge eschewing from Simon. I would also like to thank Simon for helping to manage and direct much of my work at TI in the Spring of 1998, and for staying up all night in the lab with me (I still can't figure that one out). I would also like to thank Dr. Stephanie Butler, who has supported my work from the beginning. She has taught me a lot about semiconductor manufacturing, process control, organizations, and how to look for relevant and important problems. I would like to thank Dr. Greg Shinn, for being very supportive of my efforts and my work in the CMP area at TI. I am also thankful to Leif Olsen for helping out with my work, and allowing me to constantly interrupt his work to run my experiments. I would also like to thank many other people at TI, including Chris Baum, Mark Betts, Dr. Scott Bushman, Alicia Clark, John Clark, Charles Crain, Dr. Michael Daniels, Dr. Santos Garza, Susie Gauna, Dr. Jarvis Jacobs, Dorothy McAllister, Rita McKern, Justin Scout, Dr. Robert Soper, and Dr. Michael West, who made my stay there a great experience. Here at MIT, my colleagues in the Process Control and Statistical Metrology Groups have made my life at MIT a blast. Aaron Gower is the most selfless and honorable person I have ever met. Thanks go to him for helping to build, set up, or debug every mathemati4 cal, computer, or programming problem I had. Also for all the good times, and for putting up with me for five years. I would like to thank Sandeep Sadashivapa for lots of good times outside the office, and wish him the best of luck in his new life outside of MIT. I would also like to thank Brian Goodlin for a lot of great discussions, and lots of good work together. Brian taught me a lot about the science of learning, in everything from modeling to guitar. I would like to thank Dave White for all the great discussions which educated me on topics related to work, but even more not related to work. From the old days, I would like to thank Minh Le, who is also a fountain of amazing ideas, for all the heated discussions on work and life, and for keeping in touch. The same for Dr. Ka Shun Wong, and for being an example of complete diligence that I pale in comparison to. I would like to thank Eric Stuckey for our work on process control together, and more so for the fun outside work. Thanks go to Han Chen, for helping out with many theoretical problems, and for putting up with our craziness in the office. I would to thank Rajesh Divecha and Brian Stine for teaching me much of what I learned about Statistical Metrology. I would also like to thank Dr. Dennis Okumu Ouma, who deepened my understanding of these and many other areas. Tae Park, Tamba Tugbawa, Brian Lee, Charles Oji, Vikas Merhotra, Terence Gan, and Shiou Lin Sam are to thank for broadening my experience to other areas. I would also like to thank Angie Nishimoto for her excellent attention to detail and for the great fun we had thermal energy. I wish all of these fine people the best of luck throughout their lives. I'd like to thank Professors Akintunde Akinwande and Tommi Jaakkola for agreeing to serve on my area exam committee with a very short notice. I would also like to thank Prof. Jung-Hoon Chun and Prof. John Tsitsiklis for reading my thesis and serving on my committee. This work was sponsored by in part by the NSF/SRC Engineering Research Center for Environmentally Benign Semiconductor Manufacturing. We would like to thank Paz Amit, Avron Ger, and Nova Measuring Instruments Ltd. for their assistance in setting up the NovaScan on-line metrology tool and performing some of the experiments. We would also like to thank Joost Grillaert, Dr. Marc Meuris, and IMEC for valuable discussions regarding the IMEC model. 5 Table of Contents 17 Chapter 1. Introduction ............................................................................................ 19 1.1 An Introduction to Run by Run Process Control ......................... 24 1.2.1 Blanket Wafer Performance Metrics . . . . . . . . . . . . . . 29 31 1.2.2 Patterned Wafer Performance Metrics . . . . . . . . . . .. 1.2 An Introduction to CMP ....................................... 34 1.3 An Introduction to CMP Process Control Issues ....................... 1.4 Sum m ary ...................... .. ................ .. 37 Chapter 2. Control of a Single Device Using On-Line Metrology ......................... 41 ........ 42 2.1 Evaluation of On-Line Metrology for CMP ........... 2.1.1 Measurement Repeatability ...... . . . . . . . . . . . . . . . . . . 45 2.1.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.1.3 Correlation with Ex-Situ Metrology . . . . . . . . . . . . . . . 49 53 2.2 Throughput and Cost of Ownership Improvements ...................... . . . . . . . . . . . . . . . . . .54 2.2.1 Throughput . . . . ... . . . . . . 56 . . . . . . . . . . Reductions 2.2.2 Cost of Ownership 59 2.3 Run by Run Control of CMP with On-Line Metrology .................... 2.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . .. 60 2.3.2 The Run by Run Process Control Algorithm . . . . . . . . . . . . 61 2.3.3 Patterned W afer Control . . . . . . . . . . . . . . . . . . . . 62 66 2.4 Sum m ary ....................................................... Chapter 3. Control of Multiple Devices in Dielectric CMP ................................... 69 3.1 Pattern Dependencies in Dielectric CMP .............................. 71 3.2 Modeling of Dielectric CMP ....................................... 74 3.3 Problems With Existing Control Methods in Dielectric CMP .............. 83 3.4 Current Methods for Controlling Multiple Devices in Dielectric CMP ....... 87 3.5 The Multiple Device Control Problem for Dielectric CMP ................ 92 3.6 A Framework for the Control of Multiple Devices in Dielectric CMP ....... 101 3.6.1 A Device Independent Control Algorithm . . . . . . . . . . . . 103 3.6.2 Further Discussion of the Device Independent Control Algorithm . . 108 3.7 Experimental Results ............................................ 3.7.1 Updating Both Planarization Length and Blanket Removal Rate . . . 3.7.2 Updating Blanket Removal Rate Only . . . . . . . . . . . . . 3.7.3 Correcting for Device Dependencies in the Blanket Rate . . . . . . 7 110 111 116 119 3.8 Sum mary ...................................................... Chapter 4. A Dielectric CMP Model Combining Density and Step Height Dependencies .............................................................................................. 122 125 4.1 Density and Step Height Dependent Models ........................... 127 4.2 Analysis of the MIT Density Model ................................. 131 4.3 A Combined Density and Step Height Model .......................... 134 4.4 Variations of the Time-Density Model ............................... 137 4.5 Polish Time and Device Dependencies ............................... 143 4.6 Sum mary ...................................................... 146 Chapter 5. Conclusions and Future W ork ................................................................ 149 References ...................................................................................................................... 157 8 List of Figures Chapter 1. Figure 1.1. An uncontrolled drifting process. 20 .............................. Figure 1.2. SPC control of a drifting process using tuning with WECO rules. ....... 22 24 Figure 1.3. EWMA control of a drifting process .............................. Figure 1.4. Schematic application of the CMP process in interconnect formation. . . . . 26 Figure 1.5. Chemical-mechanical polishing tool configuration. .................. Figure 1.6. Diagram of pad, slurry, and surface interactions. . 27 28 .............. 29 Figure 1.7. Blanket wafer measurement sampling patterns. . ..................... Figure 1.8. Blanket wafer removal rate profile. The surface is an interpolation based on the measured points, which are indicated by the stars. ..... .............. . .. .... 30 Figure 1.9. Blanket wafer post-polish thickness profile.. ........... ............. 30 Figure 1.10. A typical die sampling plan. . . . . . .............................. 31 Figure 1.11. Typical structures used for step height measurement.......... .. .. .... 32 Figure 1.12. A typical step height measurement before planarity is reached . ........ 32 Figure 1.13. A typical step height measurement near planarity. .................. 33 Figure 1.14. The within-die variation of a typical production wafer................ 34 Figure 1.15. Average removal rate of blanket sheet film PETEOS wafers........... 35 Figure 1.16. Average removal rate of patterned PETEOS wafers.................. 35 Figure 1.17. Within-wafer non-uniformity of blanket sheet film PETEOS wafers. . 36 Chapter 2. Figure 2.1. Polishing sequence with on-line metrology. ........................ 43 Figure 2.2. The on-line pattern recognition system............................. 44 Figure 2.3. Correlation plot of the on-line and ex-situ post-polish patterned wafer thicknesses . ........................................................ 51 Figure 2.4. Polishing with look-ahead wafers, rework, and ex-situ metrology. Total time per lot with 2 wafer rework is 255 minutes; a throughput of 0.2353 lots per hour. Total 55 time per lot with 24 wafer rework is also 255 minutes........................ 9 Figure 2.5. Polishing with look-ahead wafers, rework, and in-line metrology. Total time per lot with 2 wafer rework is 142 minutes; a throughput of 0.4225 lots per hour. Total time per lot with 24 wafer rework is 177 minutes; a throughput of 0.3390 lots per hour. These are improvements of 80% and 44%, respectively . ..................... 55 Figure 2.6. Percent reduction in the cost of ownership vs. percent rework for a process with look-ahead wafers ................................................... 57 Figure 2.7. Controlled average post-polish patterned wafer thickness over the 600 wafer experim ent . ........................................................ 62 Figure 2.8. Average removal rate of patterned PETEOS wafers................... 63 Figure 2.9. Within-wafer non-uniformity of blanket sheet film PETEOS wafers...... 64 Figure 2.10. Average post-polish thickness of patterned wafers using pilot wafers and sheet film equivalents to control the post-polish thickness of the patterned wafers . ..... 65 Figure 2.11. Controlled average post-polish patterned wafer thickness over the 600 wafer experiment using five measurement sites ................................. 65 Chapter 3. Figure 3.1. A very densely sampled thickness profile of a typical wafer, including waferlevel and die-level variation components ................................. 71 Figure 3.2. Die-level thickness profile of a test device.......................... 72 Figure 3.3. Sources of thickness variation in the CMP process . .................. 72 Figure 3.4. Total, within-die, and within-wafer thickness variation of a typical test device as a function of polishing time . ........................................ 73 Figure 3.5. Cross-sectional view of the oxide thickness in a patterned wafer ......... 76 Figure 3.6. The MIT density model predictions of the removal rate of the up and down areas as a function of time for one particular density......................... 78 Figure 3.7. The MIT density model predictions of the up area removal rates and thicknesses, as a function of time, for different densities...................... 78 Figure 3.8. A cross section of the elliptical weighting function used in the density model to calculate the effective density of the features............................. 80 Figure 3.9. A high-level view of the MIT density model . ....................... 80 Figure 3.10. Measurement plan of a test layout pattern (Device #2)................ 81 Figure 3.11. Measured and modeled values for the post-polish thickness of the raised and down areas using the MIT density model ................................. 82 Figure 3.12. Measured and modeled values (dashed lines) for the post-polish thickness of the raised areas for several polish times using the MIT density model . .......... 83 10 Figure 3.13. Blanket wafer removal rate profile and patterned wafer removal rate profiles 84 over the surface of a wafer predicted by the density model.................... Figure 3.14. Example current practice for CMP process control using sheet film equivalents (SFEs) . ................................................. 87 Figure 3.15. Post-polish thickness profiles for two different devices. Measurements were 89 taken over a grid similar to that in Figure 3.10 ............................ Figure 3.16. Multiple device control using a three site average of the thickness . ..... 89 Figure 3.17. The average thickness for two different devices predicted by the MIT density 91 m odel. .................................................... Figure 3.18. Within die variation (standard deviation of the post-polish thickness) shown for a design of experiments that varied the table speed and down force over a wide range . for the dielectric CMP process . .................................... 93 Figure 3.19. Within-die variation shown over the polishing of 600 wafers. The stars are the within-die variation measured on four dies on each wafer, and the solid line is the average of the four die from eight wafers over the 600 wafer run. .............. 94 94 Figure 3.20. Typical structures used for step height measurement . ................ Figure 3.21. Step height measurement for a low density feature, for three different processes, plotted against the amount removed on a blanket wafer. ............. 95 Figure 3.22. Higher density structures used for step height measurements. ......... 96 Figure 3.23. Step height measurement for a low density feature, for three different processes, plotted against the amount removed on a blanket wafer.............. 96 Figure 3.24. The difference in the step height measured at low and high density features, for three different processes, versus the blanket amount removed. ..... . ...... 98 Figure 3.25. The within-die variation, measured at 25 locations in 10 die, for three different processes, versus the blanket amount removed . ........................... 98 Figure 3.26. A device independent run by run process controller for CMP . ........ 102 Figure 3.27. The average planarization length over the course of 100 six wafer lots (solid line), and calculated planarization lengths for each of the four die on each of these eight ................................. wafers (dots) . ............. 109 Figure 3.28. Test devices being controlled with the device independent controller. . . 110 Figure 3.29. Measurement plan of Device 1. Circles are points used for control. Crosses 112 and circles are used to determine the true average.......................... Figure 3.30. Measurement plan of Device 2. Circles are points used for control. Crosses 112 and circles are used to determine the true average.......................... Figure 3.31. Map of the dies used for the multiple device control. . .............. 11 113 Figure 3.32. Controlled average thickness of 63 sites on four dies measured following the experiment and the device number run .................................. 114 Figure 3.33. The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies . ................... 114 Figure 3.34. Parameters extracted from the measured data during the first control ru n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 15 Figure 3.35. Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run.............................. 117 Figure 3.36. The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies . ................... 117 Figure 3.37. Measurement plan of Device A. Circles are points used for control and crosses and circles are used to determine the true average.......................... 118 Figure 3.38. Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run.............................. 120 Figure 3.39. The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies . ................... 120 Figure 3.40. Difference in the average of the measured sites and the average of the controlled sites (the "true" average) .................................... 121 Chapter 4. Figure 4.1. A high-level view of the MIT density model. ...................... 128 Figure 4.2. The removal rates of the raised and down areas using the IMEC step height dependent model ................................................... 129 Figure 4.3. Removal rates of the density and step height dependent models for both the MIT density model and the IMEC model ................................ 130 Figure 4.4. Percent difference in removal predictions between the density model and the IME C model. ...................................................... 131 Figure 4.5. Description of the pattern features in the test mask used for model comparisons . ...................................................... 132 Figure 4.6. Measured and modeled values for the post-polish thickness of the raised areas using the MIT density model (dashed line is the model fit)................... 133 Figure 4.7. Measured and modeled values for the post-polish thickness of the down areas using the MIT density model (dashed line is the model fit)................... 133 Figure 4.8. Measured and modeled values for the post-polish thickness of the raised areas using the time-density model (dashed line is the model fit)................... 136 12 Figure 4.9. Measured and modeled values for the post-polish thickness of the down areas using the time-density model (dashed line is the model fit) .... ............ 136 Figure 4.10. Model fit of the step height at contact time as a function of the effective feature . 138 density ......................................................... Figure 4.11. Model fit of the contact time as a function of the effective feature density ............................................................ 138 Figure 4.12. Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of density. ......... 140 Figure 4.13. Measured and modeled values for the post-polish thickness of the down areas using time-density model with contact height as a function of density. . . . . ..... 140 Figure 4.14. Model fit for contact step height as a function of the effective feature density.......................... ........... ......... . . . ....... . 141 Figure 4.15. Model errors for raised and down areas as a function of the effective feature density................................ ................... . . . . 141 Figure 4.16. Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of line space. ....... 142 Figure 4.17. Measured and modeled values for the post-polish thickness of the down areas using time-density model with contact height as a function of line space. . ...... 142 Figure 4.18. Model fit for step height at contact time as a function of line space. . . . . 143 Figure 4.19. Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of density for various polish times. Dashed lines are model fits and stars are experimental data points ........ 144 Figure 4.20. The planarization length as a function of the device number being run (using the experimental data from the third control run in Chapter 3). ............... 145 Figure 4.21. The blanket rate as a function of the device number being run (using the 146 experimental data from the third control run in Chapter 3) ................... Chapter 5. 13 14 List of Tables Chapter 1. Chapter 2. Table 2.1. Average differences between the on-line and the ex-situ measurements. ... 52 Table 2.2. Variation added to on-line measurements by polishing, residual slurry, and wafer loading . ......................................... . . ..... 53 Table 2.3. Breakdown of the variation in the on-line metrology system............. Chapter 3. Chapter 4. Chapter 5. 15 53 16 Chapter 1 Introduction The use of the chemical-mechanical polishing (CMP) process in the semiconductor industry is growing rapidly [1]. Its use in the polishing of inter-level dielectrics has provided the ability to significantly increase the number of levels of interconnect in integrated circuits (ICs). This has provided improvements not only in circuit performance, but also in product yield. In addition, it is a critical step in the manufacturing of newer generation ICs which utilize Copper (Cu) and shallow trench isolation (STI) processes [2,3]. While the CMP process provides many benefits to the manufacturing of ICs, it also has many problems. This is particularly true in a production setting, where the gradual wear in the polishing pads and the simultaneous processing of several types of ICs on a single polishing tool create changes in the tool operation that are difficult to monitor and control. Because our current understanding of the process still lags behind its application in the industry, statistical process control techniques have been the only methods able to achieve and maintain quality processing. There have been several works on controlling the CMP process [4-15]. However, these initial works have addressed only pieces of a larger problem. The CMP process is complicated by many factors, and controlling all of these factors in a single controller has been unrealistic. One of the most significant factors complicating matters is the manner in which the particular pattern of metal and other components are laid out within each IC to make the circuit. The interactions between these patterns and the polishing behavior of a CMP tool make monitoring and controlling the process particu- larly difficult. As a result, initial work on the process control of CMP focused on simple methods aimed at monitoring and controlling the polishing of unpatterned or "blanket" wafers [4-12]. However, as we show in Chapter 2, this is very different than controlling directly on patterned wafers. Realizing this is critical for implementing any control scheme in a production wafer fabrication facility (fab), later works began to focus on the control of patterned wafers [13-15]. Direct control of patterned wafers using a multivariate non-linear controller which monitors the average removal rate and wafer-level uniformity was shown in [14], and patterned wafer control using a self-adjusting control algorithm to control the average post-polish thickness was shown in [15]. These techniques focus on the control of a few sites on a single type of layout being polished on a single tool. We show in Chapter 3 that this gives little insight into what is happening to the locations that are not measured and controlled, and does not ensure that the entire device (or product type) is polished correctly. In addition, by restricting attention to a single device, these controllers fail to take into account the effects of different patterns being polished on different tools. For example, a device of one type may wear the CMP pad more than a device of another type, and this increased wear on the pad reduces the polishing rate of the other devices. These effects need to be combined into a comprehensive control strategy in order to properly control the CMP process. The purpose of this thesis is to outline such a comprehensive framework for controlling the polishing of wafers with multiple arbitrary pattern layouts on a single polishing tool. The approach allows for the control of multiple devices with completely different layout patterns simultaneously being polished on a single CMP tool. This is achieved by combining a CMP model that predicts the post-polish thickness of an arbitrary device lay- 18 out with a feedback control algorithm. We begin in this chapter by reviewing the basics of run by run process control, the CMP process, and some well-accepted issues involved with controlling the CMP process. 1.1 An Introduction to Run by Run Process Control As semiconductor processing entered the late 1980s, control charting and statistical process control (SPC) had substantially decreased process variability and increased process capability. In SPC techniques, the process output (e.g. deposition thickness) is monitored for different types of deviations from the process target. Traditionally, once an alarm (statistically significant deviation from the process target) is signaled, the process is shut down to perform maintenance and to re-optimize the process recipe. One set of rules for such deviations are known as the Western Electric Company (WECO) rules. A subset of these rules are: 1. Last point of data is greater than three standard deviations away from the process target. 2. Two of last three data points are greater than two standard deviations away from the target in the same direction. 3. Four of last five data points are greater than one standard deviation away from the target and in the same direction. 4. Last eight data points are all above or all below the target. Industry response to open-loop statistical process control has been overwhelming and the use of this type of control has become heavily ingrained the semiconductor industry. With the decrease in variability, however, new problems were beginning to arise. Many processes were showing signs of a steady drifting off target [16-20]. Such drifts were often 19 caused by the build-up of material on the interior components of the tools. For example, the deposition rate in a metal sputtering process is highly correlated to the life of the components within the tool, particularly due to the build-up of the deposited material in the honeycomb-like collimator used to improve coverage on the surface of the wafer [20]. The resulting drift in the deposition thickness is shown in Figure 1.1. Classical SPC approaches assume the process is "in control," and not subject to such drift. Nevertheless, these methods were often used to monitor and compensate for such problems [21-26]. However, the reduction of variation in semiconductor processing, combined with the increase in process drift, resulted in a large number of alarms. Operators and engineers began to make frequent "updates" to the process time in order to quickly bring the tool back on-line. For example, the process output was often shifted back to the target (in the sputter deposition case, this is done by adjusting the deposition time) by an amount typically equal to the sample mean of the error over the violation set (i.e. the last five data points for WECO rule #3). This led to automated SPC control, whereby a simple process 6400 -3 6200 600-- 2 - - - - Ta--a- 000 52005600- -frge 560*(3n 5800 - -Y - - - - - - - - - - - - - -- - - - - - - - - - - - 4. . 5400 -. - 0 00 5200 0 * 0 4) 5000- 5 0 5 10 520253 4800460015 Run # Figure 1.1: An uncontrolled drifting process. 20 20 25 30 model was used to automatically adjust the process inputs upon the violation of an SPC rule. A typical control method assumes a process model is of the form: y,n[n] = x[n] - t,[n](.1 where yn[n] is the process output (e.g. deposition thickness), x[n] is the process rate (e.g. deposition rate), and t,[n] is the process time on run n. Typically, the estimate of the process rate is held constant, and the process output is monitored using SPC rules such as the WECO rules above. When a violation of the rule set occurs, the estimate of the process rate is updated to the average over the violation set, i.e. V e {0} x[n+1] = x[n] x[n+1] = y,[n]/t,[n] Ve {1} (1.2) (1.3) n x[n+1] = yi]/t, v e {2} (1.4) y,[i]/tP[i] ve {3} (1.5) y,[i]/t,[i] v E 14}, (1.6) i=n-2 n x[n+ 1] = 1 i= n-4 n x[n + 1] = 1 i = n-7 where v is the rule violated (e.g. zero indicates no violation, one indicates WECO rule #1 was violated, etc.) and y,[i] is the actual process output. As can be seen in Figure 1.2, the performance of this method in controlling a drifting process can be quite poor, in the sense that the controlled process is often outside the two standard deviation limits. In this case, the statistical limits were calculated using the root mean squared error (RMSE) of a linear least squares fit of historical data of the uncontrolled thickness. 21 6400- 3a 6300 2a 6200 C 6100 - ---CY 0 - ---------------- ----- --- Target - -------------- ------------------ .C 6000 .2 0- :. 600 59 - - - - - 5900--------- a - --. - - - - - - - - - - - - - - ------------ - -* - -. - - - - - - -------------- -------------------------- - -- -a s .- 5700--------------------------------------------------- 56001 5 10 15 20 25 30 Run # Figure 1.2: SPC control of a drifting process using tuning with WECO rules. On the other hand, tools like thermal deposition furnaces experience regular shifting in their outputs, and, for these tools, the SPC approach works quite well. Other processes randomly drift away from the target output, but then drift back the other way, continually wandering about the target. Current CMP tools are a good example of this [11]. These processes, like the steadily drifting processes, suffer from poor performance of automated SPC techniques. These problems have caused a shift in control towards continual tuning methods [420,27-42]. One example of this type of control is the exponentially weighted moving average (EWMA) controller. This control method is a combination of the EWMA SPC statistic [21-24,26] and closed-loop feedback control. An exponentially weighted moving average of the process output at discrete-time n has the form x[n + 1] = w . y,[n]- (1 - w) - x[n] , where x[n] (17) is the EWMA statistic on run n and w is the EWMIA weight, which is gener- 22 ally restricted to 0< w < 1. Higher values of w result in recent measurements more strongly affecting the weighted average. This statistic is used to monitor a process output, or a state of the process, and make small incremental changes to the process recipe in order to keep the process on target. The closed-loop feedback control method using an EMWA was developed in [27-28,4,5,16,29]. Examples of the use and study of the EWMA controller and related variations are given in [4-5,9-12,14,16-17,27-34]. The simplest version of the controller uses a model identical to that given in (1.1), and replaces the tuning rules (1.2) through (1.6) with (1.7). In other words, the process model is continually tuned. For the single-variate case, the process input (process time) is calculated as t,[n + 1] = yd[n]/x[n + 1] (1.8) where Yd[n] is the desired output value. The EWMA controller provides good control for processes which have small variations over time. Stability for the single-input single-output (SISO) and multiple-input multiple-output cases are well understood and the controller has been shown to be stable over a large range of model mismatch [16,12]. In addition, the EWMA controller is designed around a statistically based filter which can be tuned to a given process (i.e. filter noise in the best possible way while minimizing errors due to real changes in the process), and methods exist for determining the optimal EWMA weight [12,30,34]. These works have demonstrated that the performance of the EWMA controller in response to process shifts and drifts while minimizing the response due to noise is quite good. This is especially true for systems which have slow dynamics buried in large amounts of noise. For example, the results of an EWMA controller used to control the deposition thickness of the sputter deposition process above are shown in Figure 1.3. We see the continual adjustment to the pro- 23 6400- 3cy 630- - - - -"- "-- - -"- -'- -"- - - - '-'-- -"-"'-- - - - - - - - - - - - - 6300 2cy 6200 00- 6200 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - G * " 61500 - - - - - - - - - - - - - - -.- - - - !- - -- - - - - - -- - - - - - - I-I 0 0 S5900------------ 5700 -- r-- - - .. -.- .. ..- - -- -- -- -- -- -- -- - 10 0 5600 L ------- .. .. ,. 4.. .. .. .. IIJ 5 .. .. ,.- .. 1 10 0 15 ------ .. .. . ..-. . . . ..... 25* 20 25 . . -a -3(y 30 Run # Figure 1.3: EWMA control of a drifting process. cess results in tighter control of the process (e.g. there are fewer points outside the two sigma limits). 1.2 An Introduction to CMP This section outlines some fundamentals of the CMP process before we move on to discussing a few of the basic issues involved with controlling the CMP process in the next section. We begin by discussing some basics of the "back-end" manufacturing of integrated circuits (ICs), where CMP is most widely utilized. By back-end, we mean the process technology and steps above the layers of metal and dielectric materials that are used in the formation of the electrical interconnections between the active components of a circuit (e.g. transistors, as formed in the "front-end" processing). As shown in Figure 1.4, the interconnect is manufactured by depositing thin films of materials, and selectively remov- ing or changing the properties of these materials in certain areas. A new "level" of thin 24 film is deposited on top of old films and the process is repeated many times until the interconnect is complete. The goal of the CMP process is to planarize step heights caused by the deposition of thin films over existing non-planar features, so that further levels may be added onto a flat surface. This process is outlined in Figure 1.4. After transistors are formed on the silicon substrate, a pre-metal deposition (PMD) of dielectric material is laid down, and contacts to the underlying devices are made in the PMD layer. This via formation often includes a metal CMP process, which is not specifically discussed in this thesis. Following this, the Metal 1 deposition occurs, followed by patterning and removal of the Metal 1 layer to create the Metal 1 lines. This step is followed by the deposition of an interlevel dielectric (ILD) layer. The patterned metal lines leave a non-planar surface on the ILD 1 layer. The CMP process is used to planarize this surface, so that interconnecting metal holes may be etched and filled with material (Via 1), and the Metal 2 connecting lines may be deposited and patterned. This proceeds to higher level metals until the circuit is complete. Current generations of ICs have up to seven layers of metal circuitry [43]. The dielectric CMP step in this sequence is critical to determining the performance of the device, as well as the defectivity rate of the circuit. Without the planarity achieved by CMP, several problems could occur. First, changes in the vertical height of the surface profile make optical patterning using photolithography difficult. The extremely small feature sizes require very tight focusing of the light. Any changes in the vertical height in different regions within the device will cause changes in the focusing and, hence, the sizes of the features. This lithographic depth of focus limitation results in extremely tight requirements on the planarity of the wafer surface. If these requirements are not met, it could 25 up areas down areas stepA height ILD 1 Metal 1 a) Cross-sectional diagram of a pre-CMP interlevel dielectric on top of metal lines. ILD 1 Metal 1 b) Cross-sectional diagram following CMP. ILD 2 mtal 2 eVia 1 Eetal ID c) Cross-sectional diagram following CMP, and application of Metal 2 and ILD 2. Figure 1.4: Schematic application of the CMP process in interconnect formation. result in large variations in the sizes of the metal lines and interconnects, and lead to a degradation in device reliability and performance. Non-planar surfaces can also cause faults in the circuit as a result of metal depositions from different levels of metal making unin- 26 tended contact. This happens when the metal interconnect holes are etched over the edge of a step in a non-planar surface. The hole is then etched through the ILD to the underlying layer, and filled with metal, causing an unintended short between levels. Therefore, the planarization of ILD layers is critical to the performance and yield of current and future generation devices. In particular, as device sizes continue to decrease and the number of metal layers continues to increase, the importance of surface planarity and CMP will also continue to grow. Future interconnect technology is moving to copper and damascene processes [2-3]. While the details of the process differ, similar demands are placed on good control of planarity in metal CMP. In Addition, the use of CMP to form shallow trench isolation structures also requires extremely good control planarity in dielectric polish. In traditional CMP processing, as illustrated in Figure 1.5, the wafer is held in a wafer carrier and pressed face-down onto a polishing pad, which is affixed to a rotating platen. A slurry with abrasive material (e.g. silica particles of size 50-100nm) held in suspension is dripped onto the pad during polishing. The carrier and platen rotate at variable speeds, typically on the order of 30 rpm. CMP tools differ in many aspects, including the number of platens, the number of polishing carriers per head, the size of the pad in relation to the wafer size, and rotary versus linear polishing mechanisms. Slurry Feed Wafer Carrier Platen Platen Slurry Feed Top View Side View Figure 1.5: Chemical-mechanical polishing tool configuration. 27 As shown in Figure 1.6, the CMP process preferentially removes material from raised areas on the surface of the wafer through a combination of mechanical and chemical action. In dielectric CMP, the chemical action is thought to serve two purposes: a) to soften, or hydrolyze, the surface of the dielectric material so that the soft pad and slurry particles can abrade away the surface, and b) to keep particles from agglomerating in the slurry. Recently, alternative slurries (e.g. CeO 2 particles) and slurry-release mechanisms (e.g. fixed-abrasive pads) have been reported to exhibit very different behavior [44-45]. In this work, we focus on conventional oxide polish systems. The preferential removal rate of the raised areas is thought to be due to the distribution of pressure of the CMP pad on the raised areas features [46-47], and in particular is related to the density and relative height of the features, as shown in Figure 1.6 [47]. Dense features tend to support less force per unit area, because the force of the pad rests on a greater amount area. In contrast, regions with a significant amount of down (or open) area, have the same force supported by less up area. *flMA 7~FkJ7~J\ Metal 1 ILD 1 LJ77Surry CMP Pad a) The CMP pad in contact with features. Metal 1 ILD 1 S lu rry msamaa ...... ... CMP Pad . b) The CMP pad supported by far away (on the order of mm) features. Figure 1.6: Diagram of pad, slurry, and surface interactions. 28 1.2.1 Blanket Wafer Performance Metrics The performance of the CMP process is gauged by several different metrics. Although the process is aimed at reducing the step height on wafers with "patterned" features, several metrics of the polishing of "unpattemed" blanket sheet film wafers are also typically used. In particular, the removal rate (RR) of material on blanket sheet film wafers is often used to judge how quickly a process will remove step heights on patterned wafers. Processes with higher removal rates are generally considered better. The RR is determined by measurement of the oxide film thickness before and after polishing at each of several sites on the wafer (see Figure 1.7 for two examples). A typical removal rate profile for a blanket wafer is shown in Figure 1.8, and the resulting post-polish thickness profile is shown in Figure 1.9. The "removal rate" metric most often used is the average of the amount removed at each site, divided by the fixed polish time. Differences between polish rates at the center and edge of the wafer (e.g. "bulls-eye patterns" as seen in Figure 1.8) may arise due to wafer asymmetry (e.g. wafer flat), non-constant relative pad velocity from the edge to the center, non-uniform slurry and by-product transport under the wafer, wafer bowing due to pressure or tool design, or machine drift (with tool or pad age) of any of these parameters. As a result, the uniformity of the polishing process across the surface of the + + Q + + + + + ++ + + + +2 ++ + + + + -6 mm exclusion edge ede+ + + 49 point sampling pattern + 25 point sampling pattern Figure 1.7: Blanket wafer measurement sampling patterns. 29 5000 .24000 3000 S20000 - -.- 1000. 100 50 100 0 -50 -50 y location (mm) 0 50 x location (mm) Figure 1.8: Blanket wafer removal rate profile. The surface is an interpolation based on the measured points, which are indicated by the stars. 10000- 8000 - C 5000 --00 50 100 50 50 0 -50 -50 y location (mm) x location (mm) Figure 1.9: Blanket wafer post-polish thickness profile. wafer is also of concern. In order for all devices on the wafer to be polished to the same amount, the within-wafer non-uniformity (WIWNU) of a polished unpatterned blanket wafer is desired to be small (typically 5% or less). The calculation of the WIWNU metric varies in the industry [48]. One common calculation used is the standard deviation of the 30 amount removed (AR) over the sites on the wafer, divided by the average AR over the several sites, times 100. Other approaches include the standard deviation of the removal rate or post-polish thickness profiles. These two blanket wafer metrics are generally used to develop CMP processes, as well as to monitor the CMP process on a lot to lot basis. In addition, particle and scratching tests are also performed on unpatterned wafers. Particles and wafer scratching caused by CMP can create severe failures in manufactured circuits [49], and thus must be carefully monitored. 1.2.2 Patterned Wafer Performance Metrics In order to verify the planarization of step heights within the wafer, the step heights are measured at several locations within several die (see Figure 1.10 for a typical sampling pattern). The sites chosen are usually large features which are easy to find and align on the measurement tool. Typically, measurement of step heights occurs on several bond pad structures such as that shown in Figure 1.11. The material stack in the raised area is 14000A of Oxide (PETEOS) on top of 230A of Silicon-Ox-Nitride on top of 6000A of Aluminum on top of the Silicon substrate. The down areas have had the aluminum removed. A sample step height measurement using a KLA/Tencor P-20 profilometer taken Figure 1.10: A typical die sampling plan. 31 Figure 1.11: Typical structures used for step height measurement. across the structures shown in Figure 1.11 is shown in Figure 1.12, for a 19 second polish time, i.e. before planarity is reached. Figure 1.13 shows a step height measurement for a 53 second polish time, i.e. near planarity. An important value determined from the measurement is the total indicated range (TIR), i.e. the difference between the maximum and minimum points of the all the steps within the scan. Recently, the CMP community has been moving toward the measurement and analysis ZLI 11 H I 3500- TIR 63000- 2500 :2000 0 1500 - 1000 E z 100 200 1500 2000 1000 - 500- 10500 1000 Distance (Microns) Figure 1.12: A typical step height measurement before planarity is reached. 32 120 100 TIR 80. '4 0 C $ 60 I0 z 20 0 0 500 1000 Distance (Microns) 1500 2000 Figure 1.13: A typical step height measurement near planarity. of within-die non-uniformity (WIDNU). The post-polish thickness of 25 sites within a die, averaged over 10 die on a typical production wafer, is shown in Figure 1.14. Here we see a large amount of variability in the post-CMP thickness within a die. A general metric for this is the standard deviation of several sites within a single die. However, a significant amount of work has gone into understanding this variation [50-57], and much of the work in this thesis will be based on these ideas. We will thus return to this issue again later. 33 7500, -. 7000 6500 - 60001. 5500 5000 20 20 15 X (MM) 15 5 10 __10 0 Y () 0 a) Surface plot of the within-die variation, along with corresponding measurement points. 8500 8000- 7500- ) 7000C 6500- 6000- 5500 - 0 5 15 10 20 25 Site # b) Two-dimensional plot of the within-die variation. Figure 1.14: The within-die variation of a typical production wafer. 1.3 An Introduction to CMP Process Control Issues Several factors make controlling the CMP process particularly difficult. It is often the case that there is significant drift in the removal rate over the life of a typical CMP polishing pad [10-12]. Figure 1.15 shows the average blanket removal rate using a data from a 25 point measurement similar to that shown in Figure 1.7 over the course of a 600 wafer 34 polishing experiment. The data shown was taken from the last wafer in the lot. We can see that the blanket rate has a fairly significant drift; an 11% decrease over the lifetime of a typical pad. This is also true for the patterned removal rate. Figure 1.16 shows the average removal rate measured at a single site on an evenly spaced grid of 22 dies of a test device, similar to a typical production wafer, plotted over the course of the same experiment. 4000 3500F 0 E 0 3000 2500 0 cc 2000 0 I 5U(. 10 20 30 40 50 60 70 80 90 100 Lot # Figure 1.15:. Average removal rate of blanket sheet film PETEOS wafers. 7500 2 0 7000 E 6500 V 6000 C0 5500 Figure 1.6 10 20 30 40 50 60 70 80 90 100 Lot # Average removal rate of patterned PETEOS wafers. 35 Again, the data is taken from the last wafer of the lot. Note that there is a corresponding 7% decrease in the average patterned removal rate over the pad lifetime. These decreases in removal rate have been the focus of most of the initial work in controlling the CMP process. Although the drift is small, if it is not correctly controlled it can lead to significant differences in the actual average post-polish thickness and the desired average post-polish thickness. In controlling the CMP process, the within-wafer non-uniformity on blanket test wafers is also frequently monitored. However, the WIWNU varies very little over the life a typical CMP pad, assuming well-designed pad conditioning practices. As shown in Figure 1.17, the WIWNU is fairly consistent over the life of a pad, and the exact value depends on the particular tool and the component quality (e.g. carrier flatness). On the other hand, the WIWNU is often the reason the tool is brought down for maintenance. In normal polishing, there is a fall-off in the removal rate near the edge of the wafer. This fall-off is usually in the 3-6mm edge-exclusion region (shown in Figure 1.7) which is not included in the 04 30 0 10- z C E E 1015 0 216 0 0o 0_ 40 7'0 so 90 100 Lot # Figure 1.17: Within-wafer non-uniformity of blanket sheet film PETEOS wafers. 36 WIWNU. As the polishing pad ages, this fall-off region moves in closer to the center of the wafer. The outer measurement points then become significantly different from the other measurements, and the WIWNU quickly increases. Once the WIWNU exceeds a certain limit, the tool is brought down, the pad is changed, and maintenance is performed on the carrier head. In addition to the removal rate and WIWNU, step height measurements are also taken. As mentioned above, the step-height is measured on a fixed feature on each particular device. Because these measurements are time-consuming, only a single step-height measurement on a single feature on a single die of a single wafer in a lot is typically taken. If the step-height has not been removed to less than a certain level, then the wafers are reworked, meaning they are put back on the polisher for additional polishing. However, care must be taken to avoid over-polishing. As we saw in Figure 1.14, the within-die non-uniformity can be large, and over-polishing can result in the low areas on the device being completely removed. As a result, step-height measurements are frequently used only as a spot check on the process during production, although they are used heavily in the process development stage (in order to ensure a process is achieving a certain degree of planarization). 1.4 Summary In this chapter, we outlined the need for chemical-mechanical polishing as a means for dielectric planarization in the manufacture of integrated circuits, and described the importance of controlling the CMP process in a production setting. Process control in the semiconductor industry has progressed from basic statistical process control to run by run 37 feedback control systems. Various metrics and goals must be achieved in the CMP process, including: removal rate, within-wafer non-uniformity, step-height, total indicated range, within-die non-uniformity, and wafer-to-wafer non-uniformity. A number of difficulties exist in controlling the CMP process, including the drift in the polish characteristics of blanket and patterned wafer performance metrics over time, and the challenge of control given a mix of device types on the same tool. In the next chapter, we turn our attention to controlling the CMP process using the EWMA controller outlined in this chapter. We will discuss factors that are important to a production control solution, including the use of integrated metrology and a simple model update strategy. We will also begin to discuss some of the difficulties in effectively measuring, monitoring, and controlling patterned wafers in the CMP process. 38 39 40 Chapter 2 Control of a Single Device Using On-Line Metrology In this chapter, we take the first steps in developing a comprehensive control strategy for the polishing of patterned wafers in CMP. The previous chapter outlined the basics of process control, CMP, and the issues involved with controlling the CMP process. This chapter aims to expand on these topics and begins to cover the details of implementing CMP run by run process control in a production setting. There are at least four major issues involved with an implementation of a run by run control system for use in a production environment: quality, cost, flexibility, and ease of use. This chapter outlines the use of an EWMA run by run control system with integrated metrology to control the average post-polish thickness of patterned wafers. An integrated, or on-line, metrology tool resides on the processing equipment and performs measurement after the wafers are processed, but before they are unloaded from the tool. The frequent measurements provided by integrated metrology, combined with proper controller tuning, result in high quality control of the post-polish thickness. In addition, the automatic measurement of the post-polish wafers and the relatively simple control algorithm provide maximum ease of use. The simplification of processing using on-line metrology and the reduction in run to run thickness variation from improved control result in a reduced cost for the CMP process. However, the methodology provided within this control framework is limited in flexibility. We will demonstrate this in the following chapter, where we 41 present a comprehensive control framework based on the concepts presented in this chapter. As stated above, our purpose is to demonstrate run by run control of the average postpolish thickness of patterned wafers using the relatively simple EWMA control algorithm in conjunction with an on-line metrology system. We will demonstrate that such a system provides quality control of the average thickness of a single site over multiple dies of a single type of patterned wafers, i.e. a single type of device, with an easy to use system that reduces cost. First, we present a study of the quality and reliability of an on-line metrology tool for CMP. This is necessary to insure the quality of our measurements before using them for control. Second, we outline the cost benefits of an on-line metrology tool, used in conjunction with a run by run controller. Finally, we demonstrate that the lot to lot drift in a polishing tool may be eliminated by using the simple (i.e. maximum ease of use) EWMA control algorithm presented in the previous chapter. We show that, when this relatively simple controller is correctly tuned and frequent measurements are enabled by online metrology, the controlled thickness using this simple control methodology is similar to or better than that reported in the literature, including those using more complex approaches. 2.1 Evaluation of On-Line Metrology for CMP Before an on-line metrology tool can be used for process monitoring and control, the repeatability and reliability of the tool must be assessed. In addition, some understanding of how measurements from the on-line tool relate to those of current ex-situ metrology tools is needed. In this section, we discuss the evaluation of a NovaScan 210 on-line 42 metrology tool from Nova Measuring Instruments, performed on an IPEC 472 CMP polisher. As shown in Figure 2.1, the on-line metrology tool resides on the polishing tool. Processing begins by moving the wafer from the load station to the primary polishing table. After polishing, the wafer is normally buffed on a soft felt pad with de-ionized water. After the buffing, the wafer is loaded into the on-line measurement tool before being placed in the unload station. The measurement process for patterned wafers is shown in Figure 2.2. Standard exsitu metrology tools first physically align the position and orientation of the wafer before performing any software alignment and measurement. However, the on-line tool does not perform any physical alignment of the wafer, but only performs a software alignment routine to determine the location and orientation of the wafer. Once the position and orientation of the wafer are established, the measurement process proceeds to each specified die. Within each die, searching begins with the alignment feature (a particular feature on each Wa.0 F-eed Sl rry Feed Wafer Load Wafer Load On-line Metrology Figure 2.1: Polishing sequence with on-line metrology. 43 Wafer Load 8.1A Software Wafer Alignment 4.3A Die Alignment Recognition Measurement Site Recognition o.5A Repeat for Each Site Repeat for Each Die Figure 2.2: The on-line pattern recognition system. type of wafer chosen by the user to orient the software with a specific die on the wafer). Once the alignment feature is located, measurement proceeds to each site. An optional alignment of each measurement site is performed and a final adjustment to the measurement position is made before the measurement of the site is taken. Measurement then proceeds to all remaining sites within that die. Once the measurement of the sites within the die is complete, the tool then moves to the next die and begins searching for the alignment feature of that die. The process then repeats until all dies are measured. The measurement process is a spectraphotometry process, whereby the intensity of light of varying frequency is measured to obtain a reflected spectrum. The spectrum is matched to internal model spectra within the tool. The thickness parameters of the model are varied, from which the thicknesses of the specified layers is determined. 44 2.1.1. Measurement Repeatability There are four sources of variability in the measurement process: the variability due to a) the initial wafer alignment software routine, b) the die stepping and alignment, c) the site alignment, and d) the actual measurement process. These variances are in a form referred to as nested variance. The variance of the measurement process is nested within the variance due to the site alignment which is nested within the variance due to the die stepping and alignment which is nested within the variance due to the software alignment. This four-level nested variance structure results in a single sample site measurement having the form Xijkl - Wi+D() + Sk(ij) + Ml(ik) (2.1) where a 2) (22 (2.2) , aD2 (2.3) ), as2 (2.4) MI(ijk) ~ N(O, am2) (2.5) Wi ~N(p, Dj(j) - N( Sk(ij) - N(g 1 This structure has several implications when measuring the variability of the measurement process. In fact, the structure in this general form is extremely complex, and many works have outlined methods for finding the terms in this structure [50-51, 58-63]. If, for example, we assume that g, 9i, and giU) are all zero, then the variance of an individual measurement will be OT2 =aW2 + a| =a D2 + +aD + s2 +m2 2 M The variance of the sample average of a wafer, under these assumptions, would be 45 (2.6) aA 2 = aW+ 1 IUD 2 2 1 2 +Jjas +DSaMG(.7 where D, S, and M are the number of dies, sites, and measurements, respectively. With this in mind, we now proceed to identify the components of variance in the on-line metrology tool. The variability of the actual measurement process, am2 , was estimated by repeatedly measuring the same point on a single wafer, without any physical movement of the wafer or mechanisms (site alignment or die alignment). This eliminates all the components of variance except MI(ijk). This process was repeated at two locations on the wafer, each with 25 repetitions, to estimate am2 , which is often referred to as the "precision" of the measurement process. The average of the sample standard deviations of these measurements was 0.5A, and is shown in its corresponding location in Figure 2.2. By averaging these values, we are assuming that the variation of two sites at different locations are white, meaning that each site has the same am2 The variation added by the site-to-site movement and the die-to-die movement cannot be measured individually on this tool, because it cannot be set to perform multiple site or multiple die measurements without including the software wafer alignment. The withinwafer measurement variation, i.e. that including a2, as , and am2 , was determined as follows. A single patterned wafer was placed on the measurement stage. One site on five die across the wafer was measured. The standard deviation of the five sites was calculated for 25 repetitions, without movement of the wafer on the stage. It is possible that there is some wafer variation included in this, because each repetition includes a software alignment. However, we will neglect this because we believe most of the variation comes from varia- 46 tions in the orientation and position of the wafer on the measurement stage, and this was not included since the wafer did not move on the stage during the measurement. The average of these 25 five-die standard deviations was determined to be 4.3A, and is also shown in its corresponding location in Figure 2.2. In our nested variance structure, each measurement we obtain is at the die-level. In particular, each measurement consists of one measurement in one site in one die. Thus, only the D](i),, Sk(ij), and MI(Jik) terms contribute to the variance of each measurement. Some assumptions on the means of these terms are necessary. Since we have only one wafer, and since there is only one site and one die in each measurement, we can assume one mean for all the measurements. Therefore, the average of five of these measurements would have a variation of (WA2 = 1aD +( 5 )()as + ( 5 )( )( 1 )2 = a2+ as2+ am2) (2.8) In addition, since we have only one mean for all the components, these will drop out in the calculation of the standard deviations. Thus the sample average of the standard deviations given above is an estimate of the variation from the die variation, the site variation, and the measurement variation, DT22= aD D22+ + aS2 + aM2 M2 cTDT (2.9) Note that we may have possibly compounding variations here. We are assuming that the variation determined from the sample standard deviation of the die measurements is due to the metrology tool, because we are using the same wafer which is very flat across the raised features. In fact, if there is within-wafer variation on this wafer, then the standard deviation of the five die measurements will contain this variation, and should be attribut- 47 able to the wafer, not the metrology tool. The variability including the measurement, site-to-site, die-to-die, and wafer alignment variability was measured by repeating the previous process, but rotating the orientation of the wafer in-between each measurement. This average variability was determined to be 8.1A, which is also shown in Figure 2.2. Again, we have assumed that the mean of the wafers, pi, is fixed over all wafers, because the same wafer was used for all measurements. We found the repeatability of the measurement process, i.e. 8.1A, to be very good, considering that the average wafer to wafer variation of blanket wafer polishing is roughly 100A to 300A. The sources of variability are all less than 10, and some of this may be due to the wafer itself as mentioned above. This suggests that the repeatability of the metrology of the tool meets the requirements for CMP. Section 2.1.3 will discuss the variation contributed by wafer loading, small amounts of slurry, and post-polish wafer clean- ing. 2.1.2. Reliability The reliability of a metrology tool in a production environment is extremely important. In light of this, we performed two reliability tests of the on-line metrology tool. In our first experiment, one site on five dies was measured on 100 wafers with the intention of testing the alignment success rate as well as the die-level pattern recognition success rate. Our experiment results show a 100% success rate in wafer alignment. Only one in 500 sites was not found (most likely due to a bubble), corresponding to a 99.8% success rate. We performed two additional experiments which measured one site in 22 dies on 24 wafers. 48 Again we found a 100% wafer alignment success rate. Four of 1056 sites were not found, resulting in a 99.6% success rate. The reliability of the on-line metrology tool during the run by run control experiment we outline later was also very good. There was one failure in 96 wafer alignments, a success rate of 99%. The site alignment success rate was 99.7% (7 failures in 2112 measurements). These site-not-found (SNF) errors are generally caused by bubbles in the water between the wafer and the measurement window. However, it was found that the pattern recognition trained for this layout had problems finding a site on the far right of the wafer, which could be due to an inability in the die-level or site-level alignment routines to compensate for inaccuracies in the stepping distance or the wafer alignment. 2.1.3. Correlation with Ex-Situ Metrology We would also like to understand how these measurements correspond to ex-situ measurements. One site on 22 dies on two sets of patterned wafers were measured on both the NovaScan 210 on-line metrology tool and on a KLA/Tencor UV1280 ex-situ metrology tool. The first set consisted of pre-clean pre-polish wafers measured on both tools. The second set were pre-clean post-polish wafers when measured on the on-line tool and postclean post-polish wafers when measured on the ex-situ tool. Care was taken to set up the measurement parameters on both tools. These parameters include pattern recognition, optical properties of the materials being measured, die stepping distances, and site measurement locations. Both the tools are spectraphotometry tools, and thus we expect similar results. The on-line and ex-situ measurements from pre-clean pre-polish wafers are linearly 49 correlated with a correlation coefficient of 0.98. The on-line values are, on average, 47A higher than the ex-situ values. The standard deviation of the errors from this linear fit (which we will refer to as the spread) is 12A, and the range of the spread is 48A. These results show that the on-line measurements correlate extremely well with ex-situ measurements. The absolute thickness values obtained from each tool are slightly different, and may be due to variations in algorithms, optics, and calibrations. Our second set of wafers were pre-clean post-polish when measured with the on-line tool and post-clean post-polish when measured with the ex-situ tool. The scatter plot for the experiment is shown in Figure 2.3. Several values lie above and below the main cluster, in addition to the few SNF errors. This phenomenon is called cycle-skipping, and is caused by a failure of the algorithm to distinguish the spectrum of the true thickness from that of a nearby thickness which has a similar spectrum. This problem was eliminated by switching to a more optimal algorithm later in the experiment. The success rate during the region with cycle-skipping was 83%. The success rate increased to 99.5% when the more optimal algorithm was used. 50 - 12000 - - - 110001000090008000- Cycle-Skips 0 0 1- 7000 6000 Site-Not- 5000 + 400 40 00 5000 7000 8000 6000 Ex-Situ Thickness ~Found 9000 10000 11000 (A) Figure 2.3: Correl ation plot of the on-line and ex-situ post-polish patterned wafer thicknesses. As shown in Figure 2.3, the pre-clean post-polish on-line values are, on average, 175A higher than the post-clean post-polish ex-situ values, and the values are linearly correlated with a correlation coefficient of 0.99. The standard deviation of the spread is 31 A, and the range of the spread is 173A. In order to determine the effect of cleaning, wafers were measured on the ex-situ tool, cleaned, and remeasured. The clean resulted in an offset of 137A, a standard deviation of the spread of 8A, and a range of the spread of 28A. The sources of the 175A offset are outlined in Table 2.1. We see that we have only 9A of unaccounted offset, which is within the variation of the measurements. 51 Experiment Average Pre-Clean Pre-Polish On-line Pre-Clean Pre-Polish Ex-situ 47A Pre-Clean Pre-Polish Ex-situ Post-Clean Post-Polish Ex-situ 137A Pre-Clean Post-Polish On-line Post-Clean Post-Polish Ex-situ 175A (175) - (47+137) = -9A Added Cleaning Due to Surface Damage Table 2.1: Average differences between the on-line and the ex-situ measurements. We are now in a position to extract the increased variation due to the effects of the loading, residual slurry, and surface damage caused by the CMP process. We can calculate this as shown in Table 2.2. If we assume independence of these variations, then we can subtract the cleaning variation and pre-clean pre-polish variation from the post-polish post-clean variation to obtain the remaining variation due to the combined effects of polishing, slurry, and loading. As shown in Table 2.2, the combined variation is less than 27A. If there is correlation in these components, then this number could actually be significantly lower. We combine this result from those of Section 2.1.2 in Table 2.3 to summarize our assessment of the variation in the on-line measurement process. These combine for a total variation of only 28A, far less than the variation of the CMP process itself. 52 Experiment Standard Deviation Pre-Clean Post-Polish On-line Post-Clean Post-Polish Ex-situ 31A Pre-Clean Pre-Polish Ex-situ Post-Clean Pre-Polish Ex-situ 8A Pre-Clean Pre-Polish On-line Pre-Clean Pre-Polish Ex-situ 12A Added Variation Due to Polishing, Slurry, and Loading 312 - (82+ 122) = 27A Table 2.2: Variation added to on-line measurements by polishing, residual slurry, and wafer loading. Experiment Standard Deviation On-line Measurement Repeatability 0.5A On-line Pattern Recognition 4.3A On-line Alignment 6.8A Added Variation Due to Polishing, Slurry, and Loading 27A Total Variation 0.52+ 4.32 +6.82+ 272 =28A Table 2.3: Breakdown of the variation in the on-line metrology system. 2.2 Throughput and Cost of Ownership Improvements Before discussing the use of the on-line metrology tool in a control setting, this section discusses throughput and cost of ownership (COO) improvements gained by the use of online metrology in CMP. These issues are highly dependent on the particular process implementation at a particular site. Therefore, we will discuss several scenarios and outline the throughput and COO improvements gained in each scenario. 53 2.2.1. Throughput While it is possible that on-line measurement could slow processing, this is mainly when more than five dies are measured on a robot-less CMP tool (such as the IPEC 472). Therefore, we will assume that the on-line measurement does not slow the polishing process. Much of the increase in throughput comes from a reduction in the number of cleans and ex-situ measurements. The savings calculated here assume that the polisher waits for the post-polish clean and ex-situ measurement before continuing to polish, which is unrealistic for some high volume facilities. Increases for these facilities will be largely dependent upon the number of cleaning and ex-situ measurement tools that are available relative to the number of polishers, and which set of tools is the bottle-neck for the CMP process. Our first scenario is outlined in Figure 2.4. This is a highest-quality lowest-throughput process. It consists of a 10 minute look-ahead and pilot wafer polish, a 30 minute clean, a 5 minute ex-situ measurement, a polish time calculation, a 90 minute polish, another 30 minute clean, a 30 minute ex-situ post-polish measurement, rework time calculations, a 10 minute rework, a 30 minute clean, and a 10 minute two wafer ex-situ post-polish measurement. If we measure all wafers and re-work only those necessary, we obtain a total time of 255 minutes; a throughput of 0.23 lots/hour. If we measure only two wafers (10 minutes) in the lot and re-work the entire lot (45 minutes), we again obtain a time of 255 minutes. Utilizing on-line metrology, as shown in Figure 2.5, we eliminate the look-ahead clean cycle, the ex-situ look-ahead measurement, the first post-polish clean, and the first ex-situ post-polish measurement. If we re-work only those necessary, then we obtain a total time of 142 minutes, a throughput of 0.42 lots/hour. If we only measure two wafers in the lot and re-work the entire lot, we obtain a total time of 177 minutes, for a throughput of 0.34 54 10 Minutes Polish Look-Ahead and Pilot Wafers Measure Clean 30 Minutes Calculate Polish Time Polish Lot 1 _1C0ea iue Measure 30 Minutes 90 Minutes Rework Lot 2/24 Wafers Clean Measure 10/45 Minutes 30 Minutes 10 Minutes Figure 2.4: Polishing with look-ahead wafers, rework, and ex-situ metrology. Total time per lot with 2 wafer rework is 255 minutes; a throughput of 0.2353 lots per hour. Total time per lot with 24 wafer rework is also 255 minutes. 12 Minutes Polish Look-Ahead and Pilot Wafers f90 on-line Measure Calculate Polish Time Minutes Polish Look-Ahead on-line Calculate and Pilot Wafers Measure Polish Time Rework Lot 2/24 Wafers Clean 10/45 Minutes 30 Minutes Figure 2.5: Polishing with look-ahead wafers, rework, and in-line metrology. Total time per lot with 2 wafer rework is 142 minutes; a throughput of 0.4225 lots per hour. Total time per lot with 24 wafer rework is 177 minutes; a throughput of 0.3390 lots per hour. These are improvements of 80% and 44%, respectively. lots/hour. These correspond to throughput increases of 80% and 44%, respectively. Similar calculations can be made for other scenarios. One might have look-aheads, but 55 no reworks. In this case, there is a 39% increase. If we have reworks, but no look-aheads, then the increases are 23% to 54%, depending on the number of wafers measured and reworked. Finally, one may have neither look-aheads nor reworks, and the increase in throughput is only 8%. Generally, low quality high throughput processes will have smaller increases in throughput, while high quality low throughput processes will have larger increases in throughput. In some cases, on-line metrology can enable highest quality processing at throughputs of medium quality processing. 2.2.2. Cost of Ownership Reductions Reductions in cost of ownership (COO) could arise in several areas: an increase in throughput, savings in chemical and water usage, and equipment reduction for future facilities. We begin with a discussion of the throughput cases above. For our calculations, we used a standard COO model for an IPEC 472 polisher. The specific dollar amounts for each scenario are proprietary, therefore we quote the percent reductions in COO. The reduction in COO could be substantially more or less, depending on whether CMP is a bottle-neck process in the facility, or whether the CMP area operates in a very high volume mode with significant parallel processing. When we have look-ahead wafers, the 80% increase in throughput in the two wafer rework case and the 44% throughput increase in the full lot rework case, result in COO reductions of 31.6% and 21.8%, respectively. For the case with look-aheads with no rework, the 39% increase in throughput results in a reduction in COO of 17.6%. It may be that only a certain percentage of the lots are reworked. We can extract the cost savings by multiplying the savings from the rework case by the percentage of occurrence and add this 56 to the product of the remaining percentage and the savings from the no rework case. A plot of the COO reductions versus the percent rework is shown in Figure 2.6 for the look-ahead wafer scenario. Here we see the savings will range from 17.6% to 31.6% for the two wafer rework case and from 17.6% to 21.8% for the full lot rework case. %U 35 Two Wafer Rework 0 30 25 0 0 20 - - -- - 15 Full Lot Rework 10 50 0 20 40 60 80 100 Percent Rework Figure 2.6: Percent reduction in the cost of ownership vs. percent rework for a process with look-ahead wafers. When there are no look-ahead wafers, there is a 12.6% to 23% reduction in COO, depending on the number of wafers reworked. With no look-aheads and no rework, there is a fixed 4.3% reduction in COO. If we have only a percentage of the lots reworked when no look-ahead wafers are run, then the savings increases linearly from 4.3% to 23%, depending on the percentage of rework. The second area for reduction in COO arises due to the reduced water, chemical, and energy usage in cleaning. We can quantify these reductions by considering the reductions in cleaning. In particular, if we do not run look-aheads or perform rework, then we have no 57 reduction in the number of cleans. However, if we have no look-aheads but run rework, then the number of cleans is reduced by 50%. Therefore, if we have a percentage rework, p, then our reduction in COO is p*50% multiplied by the cost of cleaning. Note also that these savings are independent of whether the facility runs in high volume (parallel polishing) because we still have to clean and measure the wafers in order to decide on reworking them. We can repeat these calculations for the look-ahead case. We can eliminate 50% of the cleans if we have no rework. However, Figures 4.4 and 4.5 point out that the savings increases to 66% if we have rework. Therefore, if we have a rework rate, p, then our reduction in cleaning is r = p*( 6 6 .7 %) + (1-p)* 5 0%; somewhere between 50% and 66.7%. Our reduction in COO is then r times the cost of cleaning. In addition, we see from these calculations that we would also benefit the environment substantially by reducing water, chemical, and power usage by 0% to 66.7%. The third area for COO reduction is that of tool purchasing for future facilities. We saw from the above calculations that the number of necessary cleaning tools could be reduced from 0% to 66.7%. Thus, the reduction in COO for the future facility could be as much as 0% to 66.7% times the COO for each cleaning tool. In addition, one could estimate the reduction in the number of ex-situ metrology tools. In the scenarios above, we could eliminate one to three ex-situ measurements per lot, depending on the level of processing quality. The actual costs may vary, but we assume here that the on-line tool cost is roughly one fifth the cost of an ex-situ measurement tool. If we have one ex-situ tool for every four polishers, then we could reduce our COO by as much as 20%. This number may be an exaggerated, because we can not eliminate all the ex-situ tools. These reductions in COO constitute only those tangible reductions obtained from the 58 use of on-line metrology tool. Further reductions in COO due to improved process monitoring and control are also possible. However, it is difficult to quantify these improvements. For example, improved CMP process monitoring and control will lead to less wafer scrap, less rework, and higher yields; all of which may have a substantial impact on COO. Lower process variability may enable new processing methodologies. For example, a tighter control of post-polish oxide thickness would decrease the required amount of deposited oxide, decreasing the deposition time, which in turn, would reduce chamber clean time. This would increase throughput and reduce chemical and energy usage for the deposition step. 2.3 Run by Run Control of CMP with On-Line Metrology We now demonstrate how this on-line metrology can be used in conjunction with a run by run process control strategy to improve CMP processing. Similar work has been published [4-5,7,9-12] for controlling blanket wafer removal rate, but as we will show, this is different than controlling directly on patterned wafers. Direct control of patterned wafers using this on-line metrology tool was shown in [13-15]. A multivariate non-linear controller which monitors the average removal rate and within-wafer uniformity was shown in [14], and a self adjusting control algorithm to control the average post-polish thickness was shown in [15]. We will demonstrate that the use of the relatively simple EWMA controller, when used in conjunction with on-line metrology, is more than sufficient to remove the time trends (e.g. drift) of the removal rate in the CMP process. One reason for this is that the dynamics of the CMP process are not very complex., and controllers with more 59 complex filtering methods (e.g. that shown in [15]) are not needed. In fact, all of these methods achieve quality control of the average thickness, but this is due to the increased sampling frequency gained by the on-line metrology, not by the complexity of the control algorithm. We will also show that, when proper pad conditioning methods are utilized, the within-wafer uniformity is very stable. Therefore, it is not necessary to control the withinwafer non-uniformity of the CMP process (for the cases examined here), as suggested by [14], and doing so will only increase the complexity of the controller and increase the variability of the controlled thickness. We will show here that a much simpler controller, i.e. control using only time adjustments and the relatively simple EWMA algorithm, performs extremely well. As a result, we can achieve our needs for high quality by using this simpler approach, which also serves to increase the ease of use over more complex control algorithms. However, the control approach taken in this section has several limitations in flexibility. In particular, it controls only a few sites of a single device. This section will point out one problem associated with this approach to control, and set the stage for the control of multiple devices that we will address in the next chapter. 2.3.1. Experimental Setup In the 600 wafer control experiment described below, we simulated the polishing of one lot of patterned wafers by using five blanket "filler" wafers, one blanket "prime" pilot wafer (which was used to monitor the blanket removal rate and uniformity in each lot), and one patterned wafer (the last wafer in the lot was used as a lot-based monitor wafer). The experiment was performed on an IPEC 472 polisher, with a primary polish step on an IC10OO/SUBAIV pad stack followed by a 30 second buff with de-ionized (DI) water on a 60 felt pad. Measurements were taken on the on-line metrology tool. All filler wafers and pilot wafers were measured with a 16 point blanket oxide measurement recipe. The patterned wafers were short-flow wafers consisting of a level-one metal layer, followed by a Titanium-Nitride (TiN) barrier layer, a Silicon-Oxy-Nitride (SiON) anti-reflective layer, and a PETEOS inter-level dielectric (ILD) layer of a test device. The pattern recognition was trained to measure one site on 22 dies on the wafer. 2.3.2. The Run by Run Process Control Algorithm The control algorithm uses an exponentially weighted moving average (EWMA) to monitor the average removal rate. This averages past values of the rate to obtain a filtered version of the removal rate, which is weighted to more closely approximate recent values. The EWMA estimate of the average removal rate is updated once per lot, and is used to calculate the polish time for the next lot. The average removal rate on run r[n] = Tpre[n] - T,,,,[ n] t[n] n is given by (2.10) where T ,,[n] is the average thickness of the wafer prior to polishing, T,,,,[n] is the aver- age post-polish thickness, and t[n] is the polish time on run n. We then compute an expo- nentially weighted moving average (EWMA) of the average removal rate a[n] = where w w.r[n]+(1-w) a[n-1], (2.11) is the EWMA weight. The controller used during this experiment utilized an EWMA weight of 0.6, and a[0] = 5200A/min. The EWMA weight was determined from historical data, as in [11-12,20]. The EWMA average is used to determine the polish time on the next run 61 t[n + 1] = Tre[n + 1] - TDesired a[n] (2.12) where TDeied is the desired post-polish thickness. The desired thickness was 8000A. 2.3.3. Patterned Wafer Control The patterned wafer control over the 600 wafers is shown in Figure 2.7. The control run was started with a polish time based on the preliminary estimate of the polish rate given above. This was a poor estimate of the actual rate and the controller responded rapidly. The root mean squared error (RMSE) of the controlled thickness from the desired value after a four lot break-in period was 121A. The RMSE of the uncontrolled (fixed polish time) case was estimated to be 245A. After removing the cycle-skipping, the RMSE was 97A for the controlled thickness, and the RMSE of the uncontrolled case during this region was estimated to be 250A. This level of control is equal to or better than any other reported CMP control using more complex filtering algorithms. In addition, we can see in 12000 4 Lot Break-in Period 100009000 - :E 800000 7000- -1w, =97A 600-4RMSE 50004000E 400 30002000 =121 A -RMSE RMSE 10 20 30 40 =280A 50 60 70 80 90 Run # Figure 2.7: Controlled average post-polish patterned wafer thickness over the 600 wafer experiment. 62 Figure 2.8 that the patterned removal rate has very slow dynamics; large changes generally occur over several (e.g. 20) runs. The integral (summation of past errors) filtering provided by the EWMA algorithm is more than sufficient to monitor and control these dynamics. More complex filtering methods would only serve to increase the variability of the process by responding to the noise in the removal rate seen in Figure 2.8. 000 7500a) cc 7000 - 4' 0 E 10 6500 4) 6000 0L 5500- 10 20 30 50 40 60 70 80 90 100 Run # Figure 2.8: Average removal rate of patterned PETEOS wafers. Figure 2.9 shows the uncontrolled within-wafer variation of the amount removed during the control experiment. We see here that the within-wafer non-uniformity is extremely well-behaved, consistently around 5% (which is typical for this tool), over the course of the 600 wafer (96 lot) experiment. We can clearly see that there are no dynamics in the within-wafer non-uniformity, and controlling on this metric would only result in increasing the noise in the controlled thickness. We now compare this approach with that of controlling the final film thickness on patterned wafers using blanket wafer removal rates and a sheet film equivalent (SFE). This approach uses historical data to determine an SFE, which is the ratio of the patterned and 63 30 E. 250 E 0 20- > 0 15- z E 100 E5 ' 10 20 30 40 50 60 70 80 90 100 Lot # Figure 2.9: Within-wafer non-uniformity of blanket sheet film PETEOS wafers. blanket wafer removal rates. The operator multiplies the blanket rate from a pilot wafer by the SFE to estimate the patterned rate, and uses this to calculate the polish time. Since we had the actual patterned rates, as well as the blanket rates, we determined what the control would have been using this process. The SFE was calculated for each lot, and the control results were determined for each of these values, as well as for the average SFE. The best result for this approach is shown in Figure 2.10. The RMSE was 138A, 14% worse than direct patterned wafer control. The control using the average SFE resulted in an RMSE of 162A, a 39% decrease in performance. We also tested the control using a subset of five of the 22 dies used during the actual experiment, and estimated the controlled thickness using only these five dies. The resulting control, shown in Figure 2.11, had an RMSE of 280A, an increase of 130% over the 22 die control. During the regions where there is no cycle-skipping, the RMSE was 150A, a 78% increase. Even though the five die average was controlled to the 8000A target, the 22 64 12000 11000 10000 :E VA 9000 8000 7000 0 a) 6000 5000 4000 4- 4 Lot B reak-in -I -I =138A -RMSE - RMSE =156A RMSE =239A 3000 10 20 - 50 40 30 60 -- 70 80 90 Run # Figure 2.10: Average post-polish thickness of patterned wafers using pilot wafers and sheet film equivalents to control the post-polish thickness of the patterned wafers. 12000 11000 5 Site Average 10000 9000 8000 (A (A 7000 c RMSE =149A 4 - 6000 4 22 Site Average RMSE = 280A 4000 RMSE = 528A 3000 -- 01nnn 10 20 30 40 50 60 70 80 90 Run # Figure 2.11: Controlled average post-polish patterned wafer thickness over the 600 wafer experiment using five measurement sites. die average was roughly 1000A lower. Thus, using this type of control, sampling a reduced number of dies can substantially reduce the quality of control.In the next chapter, 65 we will see that this lack of control using a reduced number of dies is the result of monitoring and controlling only one site with a poor understanding of how the patterned removal rate of that particular site being measured is affected by the blanket rate and the specific pattern layout of the device being controlled. 2.4 Summary The increased sampling frequency provided by on-line metrology allows a run by run controller with a simple filtering algorithm to effectively compensate for the dynamics of the CMP removal rate. The variability of on-line CMP metrology tools is now well below CMP requirements, the reliability is very high, and the measurements from these tools correlate well with measurements from ex-situ metrology tools. In addition, the use of these tools could lead to substantial reductions in cost of ownership through increased throughput, decreased cleaning, and decreased need for more expensive ex-situ metrology tools. These benefits make the use of on-line metrology ideal for run by run process control. Control of the post-polish average film thickness for a few sites of a single device using on-line metrology provides excellent control results with a simple EWMA controller. Because the dynamics of the CMP process are relatively slow, complex control methodologies are not necessary to control the CMP process. In addition, the within-wafer variation is very stable over extended periods of polishing, and control methods which attempt to control using this metric are likely to only add variability to the process. However, this experiment suggests that controlling the average thickness with a small number of measurement sites or a small number of measurement dies may result in increased variability due to an increased sensitivity to measurement errors as well as con- 66 trol of the process to an inaccurate average value. In the next chapter, we will explain why these effects occur and outline a framework which reduces this sensitivity. This experiment also indicates that performing control using pilot wafers would cause a 39% increase in the average error of the controlled output over performing control directly on patterned wafers. The next chapter will also explain this effect and discuss why such approaches result in poor control of the CMP process. 67 68 Chapter 3 Control of Multiple Devices in Dielectric CMP The previous chapter demonstrated that a simple filtering algorithm is effective in controlling the time dynamics of the CMP process. In particular, the average post-polish thickness measured at a few sites in a few die of a single device was controlled to within 100A. While the control scenario outlined in the previous chapter is the one typically discussed in the literature, production fabrication facilities often face a much more difficult task. Fabs generally have more than one type of device running through the facility, and this has a significant impact on the CMP process. Therefore, the control approaches used in production are often very different than that described in the previous chapter and in the literature. Control approaches used in production seem to go in one of two directions. One approach is to maintain a separate controller, e.g. like that outlined in the previous chapter, for each device being run through the CMP process. The second approach attempts to tie the control of the separate devices together using a common parameter, such as the blanket removal rate. Each approach has its advantages and disadvantages. The first approach has the advantage that measurements from one device do not degrade the control of the other devices, because it does not use an inaccurate relationship between the polishing rates of the different devices and a common parameter. However, the second approach has the advantage that changes in the process which are common to all devices, e.g. blanket rate changes due to pad wear, can be estimated from any device being processed and passed on 69 to the control of the other devices. For example, estimating the blanket removal rate from measurements of the patterned removal rate of one device, and using this updated blanket removal rate to estimate the time needed to polish another device, allows changes in the blanket rate which occurred during the polishing of the first device to be taken into account when determining the polish time for the second device. This is extremely important if one device has not been run in a long time, and the performance of separate controllers can be poor when some devices are not run regularly. In this chapter, we address the critical issue of multiple device control. The previous chapter demonstrated that controlling the process using a reduced number of die, or using estimates of the patterned removal rate based on measurements of the blanket removal rate, results in a decrease in the quality of control. We begin this chapter by discussing the importance of pattern dependencies in CMP. Following this, we describe in detail a CMP process model that has been well-received in the CMP community, because it has been shown to be relatively device independent. We then show that this model provides insight the problems seen with the control solution in Chapter 2, and with other solutions in the literature, are the result of poor assumptions regarding pattern dependencies. In addition, we outline additional problems that arise in the case of multiple device polishing. We then define the multiple device control problem addressed in this chapter, and demonstrate how this model can be incorporated into a CMP control system to provide a framework for a device independent control strategy for the CMP process. Finally, we present experimental results to demonstrate the effectiveness of this approach. 70 3.1 Pattern Dependencies in Dielectric CMP We would like to understand the problems seen with the control approach outlined in Chapter 2. As we will see, these problems are mainly due to variations in the pattern dependencies in CMP that were neglected in the set up and design of the control strategy. This neglect is typical in many CMP monitoring and control scenarios, and this section aims to demonstrate why this leads to poor control of the CMP process. Consider the postpolish thickness profile of an entire patterned wafer shown in Figure 3.1. Here we see several effects. First, there is a smooth surface with deep valleys carved inside. The smooth surface is the result of wafer-level variation. The sharp valleys are the thicknesses of regions within each die, and are caused by the pattern dependencies in polishing. An example of such pattern dependencies for a single die is shown in Figure 3.2. Here we see the post-polish thickness profile of a test device containing areas of memory and logic. If we uniformly sample the entire die at several locations, we will get a complete picture of the die-level non-uniformity, i.e. similar to that shown in Figure 3.2. However, sampling 2.2- XY (mm) Figure 3.1: A very densely sampled thickness profile of a typical wafer, including wafer-level and die-level variation components. 71 0 8000 00%7500,.70000 C6500 - :E 6000,.. 5500 5000 20-20 15 15 X11 5 Y (mm) 5 0 0 Figure 3.2: Die-level thickness profile of a test device. several sites on several die is much too time consuming in a production environment, so we are often forced to choose one, two, or maybe three sites to monitor on typically five to nine die. In order to understand the importance of this die-level variation, we have shown the sources of variation of the CMP process in Figure 3.3. The wafer-to-wafer data shown on the left is the average of the controlled thicknesses measured at 22 sites across the patWithin-Die Within-Wafer Wafer-to-Wafer 7M00 7M00 7=0 -7000 70007M 6500 10 20 20 4 0 .. Wafer # 70 6 0 00 1 2 3 4 5 6 7 . 1 1. . Die # Figure 3.3: Sources of thickness variation in the CMP process. 72 . 1 10 Site # 20 25 terned wafers in the 600 wafer experiment described in Chapter 2. The within-wafer data is the average thickness of 25 sites, plotted for 10 die across the surface of one wafer. The within-die data is the thickness of each site, averaged across 10 die, plotted for each of 25 sites. The range of the wafer-to-wafer variability is approximately 500A. The within-wafer variation is also approximately 500A. On the other hand, the within-die variation seen on the right is greater than 3000A, six times larger than the other two sources of variability. In fact, the within-die variation is the largest contributor to the global non-uniformity seen on a post-polish wafer. This can be seen in Figure 3.4. Here the total variation was calculated as the standard deviation of 250 measurement points, 25 sites measured uniformly across each of 10 die uniformly spaced across the wafer. The within-die variation was calculated by taking the standard deviation of the 25 sites within each die, and averaging over the 10 die. The within-wafer variation was calculated by calculating the standard deviation of the each over the 10 die, and averaging over the 25 sites. This was done for 900 800 - - Total Variation 700- . 600, Within-Die Variation s500C Be U) 400- Within-Wafer Variation ------.. 300200 10 20 40 30 50 60 Polish Time (Seconds) Figure 3.4: Total, within-die, and within-wafer thickness variation of a typical test device as a function of polishing time. 73 several different polishing times. We can see that the magnitude of the within-die variation is nearly as large as the total variation. As a result, the profile of the total variation as a function of the polish time is almost entirely determined by the profile of the within-die variation. Because the within-die variation is the largest contributor to global non-uniformity, any control technique that does not take this into account is likely to have problems. Before we address the problems with the control outlined in the previous chapter, as well as potential problems with other existing techniques for controlling the CMP process, we would like to understand what causes the pattern dependencies in the polishing process. The next section describes a CMP model, which will be used throughout this work, that will explain how these pattern dependencies come about. 3.2 Modeling of Dielectric CMP Before we dive into discussing the problems of the control used in the previous chapter and the many issues involved with the control of multiple devices, we need to build our fundamental understanding of the CMP process. One way to do this is to consider the state of the art in CMP modeling. We will not duplicate existing works by thoroughly reviewing CMP models; such reviews can be found in [54,56]. Instead, we will focus on a few particular models, and discuss one model that has received a large degree of acceptance in the CMP community in detail. Work developed by Preston in 1927, related to the glass polishing industry, outlined the fundamental mechanical motions relating to CMP [64]. The basic premise behind this model is that the removal rate is equal to some coefficient (Preston's Coefficient) times the 74 pressure on the surface times the relative velocity dz = -k pv. dt (3.1) Preston's Coefficient is used to capture effects such as the chemical being used to break down the bonds at the wafer surface, the surface roughness or structure of the polishing pad, and the type of abrasive particles used. Several other models exist which attempt to better explain the behavior of blanket wafer polishing, but most of these are neither physically substantiated nor significantly better than Preston's equation. The issue of patterned wafer modeling, on the other hand, is much more complex. Several models have been proposed to account for pattern effects in CMP but they have limited applicability (see [54,56] for a detailed review). Their limitations range from being based on non-representative test structures to using such a small range of process conditions and layouts that they are rendered ineffective beyond the scope of the original experimental conditions. Other limitations, especially for phenomenological models, are the complexity and difficulty of relating the model parameters to physical factors. Most of the models with the exception of those proposed by Renteln [65], Hayashide et al. [66] and Stine et al. [53], do not apply across a whole die. The integrated characterization and modeling methodology developed at MIT, or the "MIT density model," accounts for pattern effects across the whole wafer in a systematic way [53,56]. This density model provides a first-order approximation of post-polish dielectric thicknesses for arbitrary layouts [56]. The model is based on Preston's equation. Preston's equation states that the removal rate is proportional to the pressure times the velocity. The density model assumes that the force is distributed evenly among the raised features, see for example Figure 3.5. When we are polishing a blanket wafer, the entire 75 down areas up areas Z1 z > z0-z1 ZO Oxidez<z-1 Metal Figure 3.5: Cross-sectional view of the oxide thickness in a patterned wafer. surface of the wafer supports the force. However, when we have patterned features, the pressure is inversely proportional to the density of the raised, or up, features. Therefore, Preston's equation can be re-written as dz=-k dt vv K Pefx, y) (3.2) where K is the blanket removal rate, and pef/x, y) is what is called the "effective" density of the region. We will explain effective density momentarily, but for now let us assume that it is the percentage of the area occupied by the raised regions (lines) versus the down regions. The model actually assumes two polishing regimes. In the first regime, the model assumes that the polishing pad is entirely supported by the raised features, and thus there is no removal in the down areas. In the second regime, i.e. once the step height (zI in Figure 3.5) is completely removed, the model assumes the removal rates of both the raised and down areas are equal to the blanket polishing rate. The model assumes that there is an instantaneous change in the removal rate once the step height is completely removed. In other words, the model assumes there is an instant change in the density, from the initial effective density of the feature, pefix, y,L), to a density of one. Thus the effective density 76 is dependent on the remaining thickness p(x, y, z,L) z = {f/YL) Pefl~x, Y, L)1 PY (,ZL) 0 z1 (3.3) z > zo - zi(3) z <Z-Z where zo is the initial oxide thickness, z1 is the as-deposited step height, and zo-zl is equivalent to the thickness of the down region when the step height has been removed. This results in an oxide thickness which is given by Kt z(t) = JZO pefx,y,L) Z -Zi - K(t - t) 1 (3.4) t>tl where t p = x, y, L)z K(. (.5 is the time at which the step height is removed. A profile of the removal rates of the raised and down areas for one particular effective density, as a function of time, is shown in Figure 3.6. We can clearly see the change from the first regime of polishing to the second regime. The time of planarization (where the polishing rate switches to the blanket rate) is highly dependent on the density. This can be seen in Figure 3.7, where the raised area removal rate is shown for several different effective densities, as a function of polish time. The resulting raised area thickness profile is also shown. Note that, during the first regime of polishing, the model asserts that the amount removed is proportional to the polish time and inversely proportional to the pattern density. This is the dominant effect in the polishing of patterned wafers, and results in the large differences in the thicknesses for different density features seen in Figure 3.7. We will return to this notion several times, particularly in Section 3.4, where we describe problems with existing control techniques. 77 a) 2-4 40 a) a) 0 D 300 200 -- 0 E 1000 0 20 40 60 80 100 40 60 80 100 0. 100 a a) a) a 80 6040- E 20n 0 20 Polish Time (Seconds) Figure 3.6: The MIT density model predictions of the removal rate of the up and down areas as a function of time for one particular density. 50( 40- 10% Density 400 300 30% Density 200 C. 0 E 50% Density 100 ----70% D ensity OL 0 20 40 60 - 80 100 80 100 Polish Time (Seconds) 1. 9 a)m a) a) 1.4 a) a) 1~ 1.3 0. 1.2 - 10% Density I-- 1.1 0 20 40 60 Polish Time (Seconds) Figure 3.7: The MIT density model predictions of the up area removal rates and thicknesses, as a function of time, for different densities. 78 We now turn to explaining what is meant by "effective" density. In the examples above, we referred to the effective density as the density of the features on the layout. Feature density is generally determined over a small region, e.g. 0.1 km. However, in the CMP process, the "effective" density seen by the pad, meaning over which the pressure is distributed, is significantly larger than this. Therefore, the effective density is determined by computing a weighted average of the feature densities within a much larger window. The size of this window is referred to as the planarization length. The weighting function used is based on the flexing properties of a surface under the pressure of a localized force 2 -- 2sin 20dO k2 r<a 0 w(r,L)= -_T T 2 2 (k2-r) 1-- Lsin 2d94r 0 1 1-L 4r2 1 L do L2 2 2sin 0 S4r 2 (3.6) r>a where k 4(1-v2 )q (3.7) and where r is the radial distance away from the point (x, y) of interest, L is the planarization length, and q is the load, v is the Poisson ratio, and E is Young's modulus of the pad material. The value of k2 is generally left out, and the weighting function is normalized to have a peak magnitude of one. A cross section of this two dimensional spatial weighting function is shown in Figure 3.8. 79 L Figure 3.8: A cross section of the elliptical weighting function used in the density model to calculate the effective density of the features. The modeling process is summarized in Figure 3.9. The model is centered around the density dependent removal rate for each location on the wafer. The blanket rate, K, can be determined from the removal rate of a blanket wafer, measured at a location near the die of Patterned Blanket Removal Rate Profile A Removal Rate K _ _ eff(X Y, L) Effective Density Profile Layout File Figure 3.9: A high-level view of the MIT density model. 80 interest. This blanket rate is used for the entire die. Within the die, the effective density for each point of interest is calculated using equation (3.6). This can be done for many points within the die to generate an effective density profile of the entire die. The effective densities can then be used to generate the patterned removal rate profile via equation (3.2), and in turn, the final thickness profile of the entire device from equation (3.4). This model's key strength is its ability to efficiently predict the thickness of an arbitrary layout to a first order. This benefit comes from the weighted average of the densities within the planarization length, or interaction distance. This weighted averaging is the key to the model, because the pressure distribution of force on a particular feature is affected by neighboring features. The planarization capability of any particular process is captured by the planarization length parameter and the blanket rate profile. Given these parameters, the evolution of the thickness on an entire wafer may be obtained using this analytical model. We can see the quality of this approach by testing the model fit to polishing data. This was done for data measured on the test pattern shown in Figure 3.10. The dots indicate the X (mm) Figure 3.10: Measurement plan of a test layout pattern (Device #2). 81 60 locations measured on the die. The measured thicknesses, as well as the model fit, are shown in Figure 3.11 for both the raised features, as well as for the down areas between these features. We can see that the model does an excellent job fitting the wide range 13000 Measurements 12000 .011000 C 10000 9000 V4) cc M Model Fit 8000 'I 4 7000 .4 6000 -. 4 t~I RMSE = 220 0 10 30 20 40 50 60 Site # 1.7 X 10 1.68 1.66 1.64 0 SMeasurements 1.62 0 0 a .2 1.6 I- 1.58 -* i 1.56 Model Fit - 1.54 1.52 .1C RMSE = 244 0 10 30 20 40 50 60 Site # Figure 3.11: Measured and modeled values for the post-polish thickness of the raised and down areas using the MIT density model. 82 (6000A) of thicknesses. In both cases the fit error is less than 250A. Similar results have been obtained for this model when used for prediction. The results shown here are for a single die on the wafer. In addition, the MIT density model can predict the spatial evolution of an arbitrary layout through time, as shown in Figure 3.12. Here the thickness profile of the test patterned wafer is plotted for four values of time. The stars are the experimental values and the lines are the predictions obtained from the MIT density model. Once again we see that the model provides an excellent fit to the polished wafer data. 14000 Seconds *19 14000 -12000 0Wj '41 Scn 8000 - 90 Seconds S4.~ *4 6000 $ 4000 - 2000' 0 Seconds 4 10 30 20 40 50 60 Site # Figure 3.12: Measured and modeled values (dashed lines) for the post-polish thickness of the raised areas for several polish times using the MIT density model. 3.3 Problems With Existing Dielectric CMP Control Methods in Now that we have outlined the importance of pattern dependencies in CMP, and provided insight to the cause of these dependencies through the study of the density model, 83 we are in a position to better understand the problems with the control strategy of the previous chapter, as well as other approaches outlined in the literature. Consider the sampling plan used for control in the previous chapter. One site was measured on 22 die across the surface of the wafer, and the average of these measurements was taken to determine the average post-polish thickness. This assumes that the patterned removal rate can be modeled as a single value, regardless of the position on the wafer or in the die. It does not contain any information about the interaction between the pattern layout density at the location within the die and the blanket polishing rate profile across the wafer. Even though there is little variation in the blanket wafer profile, as shown in Figure 3.3, equation (3.2) tells us that the polishing rate of the sites are initially equal to the product of the blanket removal rate and one over the effective density of the measurement site. For a single site, the effective density of the site exaggerates the blanket wafer profile, as shown in Figure 3.13. When we reduce the number of sites from 22 to five, we obtain fewer points on the 3Xi0 4 2.5 E 2 0 E 1~50% 20% Density Feature Density Feature Blanket Rate Profile -100 -50 0 50 100 Radial Distance (mm) Figure 3.13: Blanket wafer removal rate profile and patterned wafer removal rate profiles over the surface of a wafer predicted by the density model. 84 outside of the wafer. Since the outside points are thicker, the 22 site average thickness is lower than the five site average. While this difference in the average thickness is small for a blanket wafer, it may be large for a patterned wafer. As shown in Figure 3.13, the difference will depend on the density of the site being measured. In addition, the division of the blanket rate by the effective density results in an increase in the noise in the patterned wafer thickness measurements (in the first polishing regime), due to blanket rate variations V(PR) = V(BR) 2 (3.8) (p0 (''M where V(PR) is the variation (variance) in the patterned removal rate, and V(BR) is the variation, i.e. wafer-to-wafer variation, in the blanket removal rate. Both of these factors lead to a decrease in the controlled results when measuring a single site on only a few die on the wafer. Another problem with the control approach used in the previous chapter is that the control is focusing entirely on the run by run time dynamics of the CMP process. In other words, blanket wafer characteristics such as blanket wafer removal rate and uniformity are the parameters being controlled. While wafer-level performance metrics, e.g. average removal rate and wafer-level non-uniformity, are highly monitored, the device dependencies are often neglected. The result of this neglect is that device variability can be large. In particular, the quality of control is much more sensitive to the choice of the measurement location than to the wafer-to-wafer variation or the within-wafer variation (see Figure 3.3). For example, if we measure around the peak in Figure 3.2 and correctly control this point to the desired average thickness, then we have actually over-polished most of the die. On the other hand, if we pick the valley in the center and control this point, then we actually 85 have under-polished most of the die. The amount of this over-polish or under-polish will be on the order of 1000A, even though the wafer-to-wafer and within-wafer variation are tightly controlled. One approach might be to pick the peak, valley, and mid-point of this profile. In reality, most engineers do not know what the polishing profile of the device being controlled looks like ahead of time. As a result, they may randomly pick a few points to measure and control. Or the engineers may be asked to measure and control the features which are generally most difficult to planarize. This is particularly damaging, because this normally results in the choice of the extremes in Figure 3.2 being chosen for control. In addition, there are manufacturing issues that keep certain areas from being measured. For example, the valley in the center of this device is caused by a region of low density. According to equation (3.2), a region of low density polishes very fast, leaving a very thin post-polish thickness. These features are often very small, and properly measuring them in a reliable way on a metrology tool, on-line or ex-situ, is extremely difficult. Therefore, these areas are rarely monitored in a manufacturing environment. Therefore, typical control strategies will tend to monitor and control higher density regions, which are thicker. As a result, wafers are generally being over-polished. Traditional control approaches that monitor and control only a few sites also have no ability to estimate the range of the true within-die variation. The within-die variation of a particular device determines many things, including the performance of the circuit and the necessary amount of deposited dielectric material in order to achieve planarization without over-polishing through to the underlying substrate. 86 3.4 Current Methods for Controlling Multiple Devices in Dielectric CMP We have now seen how the neglect of pattern dependencies can lead to serious problems in the control of the polishing of a single device. We now turn our attention to the control of multiple devices being polished on a single polishing tool. We will look at how multiple devices are currently controlled in a production system and outline problems with these approaches before presenting our solution to controlling the CMP process with multiple devices. One approach to multiple device control, shown in Figure 3.14, is one in which control is performed using blanket test wafers and sheet film equivalents (SFEs). In this scenario a test wafer is run and the blanket removal rate and non-uniformity are determined. A sheet film equivalent (previously found through experimentation) for the particular device being run is used to calculate the polish time for that device as a function of the Polish Blanket Test Wafer Use SFE to Calculate New Time Measure & Rework Lot Time? N lp No NNo< Idle Polish Lot New Device? Yes Update Time _ Use SFEs to Update Time I Yes Figure 3.14: Example current practice for CMP process control using sheet film equivalents (SFEs). 87 blanket wafer removal rate. Various methods are used for this procedure. One method simply multiplies a constant factor (one for each device) times the blanket rate to obtain the patterned removal rate for that device. Another approach uses a fixed amount of time required to planarize the features on that device plus or minus the blanket removal rate times a change in time to meet the desired target thickness. Once the time for the device is determined, the lot is polished. One or more wafers in the lot are measured, and, if necessary, one or more wafers in the lot are reworked, i.e. briefly polished again a second time, so that they meet the desired thickness. If the tool has been sitting idle when a new lot arrives to be polished, this procedure is normally repeated. Otherwise, if the lot is the same device, then the polish time is tweaked using the measurements from the rework stage. If the lot is a new device, the same estimate of the blanket rate is kept (or modified from measurements in the rework stage) and the polish time for this new device is calculated using the SFE for the new device. If we are trying to use this approach to control two devices with post-polish thickness profiles like those shown in Figure 3.15, we will have several problems. First, approaches like this will suffer from all the same problems we outlined for the single device controllers in the previous section, i.e. they do not take into account the variation in the post-polish oxide thickness due to the pattern dependencies. As a continuing example of the problems with controlling the average post-polish thickness using only a few sites in the die, consider the differences between the three site and 63 site averages for the two devices shown in Figure 3.16. Here we see that there is a large difference between the two averages. In addition, the differences in these averages vary depending on the particular device. Therefore, if we utilize the control approach outlined above, which utilizes SFEs 88 6) 12000 0 10000- 0 I !p*'++" pt %4 8000 - 6000 - '- 4000 a. 2000 0 10 20 30 40 50 V0 60 0 4 6) 0 12000 FA I.- 0 Fiue IL C 100000 8000 6000 4 Anne 0 20 10 40 30 50 70 60 Site # 5: Post-polish thickness profiles for two different devices. Measurements were :E taken over a grid similar to that in Figure 3.10. 1WWWW 9500 9000 8500 -- 8000 7500 - 7000; 63 Site Average, 3 Site Average 5 10 15 20 5 10 15 20 2- 0 Lot # Figure 3.16: Multiple device control using a three site average of the thickness. based on measurements from only a few sites within the die, the resulting control may appear accurate, but in actuality the true average may be very different than the average of 89 the measured sites. Second, the range of thicknesses within the die is different for each device. Since the control strategy using SFEs only measures a few sites on the wafer, there is no way to monitor the within die variation for the different devices. As a result, this approach cannot dynamically limit the amount of polishing in order to ensure that the device is not overpolished to the point where the oxide is too thin or the underlying layer is exposed. Third, the distribution of densities on each device layout determine the removal rate and the corresponding thickness profiles over time. We saw in Figure 3.7 that these profiles are highly dependent on the effective densities of the particular site or sites being monitored. Therefore, when the measured values are used to compute the average thickness, the averages are highly dependent on the density profiles of the particular devices being run and on the particular site or sites being measured. Consider the average thicknesses of two different devices with different density distributions in their layouts shown in Figure 3.17. The resulting average thickness profiles for the different devices are very different. In addition, their behavior over time is complex. Specifically, each device average begins as a linearly decreasing function. The rate of the linear decrease depends on the densities of the points measured on each device. Devices with more low density features will have a higher decrease in the average thickness, and devices with more high density features will have a smaller decrease in the average thickness. In addition, the density distribution will also determine the time at which the linear decrease will become nonlinear. This point occurs when the first feature reaches planarity. At this point, one component of the average will remain fixed at the blanket rate. After this point, features with higher and higher densities will also begin reach planarity, and consequently have removal rates fixed 90 1.4 1.35 - E C1.3 - 0 Device #1- 20 40 60 80 100 Polish Time (Seconds) Figure 3.17: The average thickness for two different devices predicted by the MIT density model. at the blanket removal rate. Only when all the features are planarized will the average removal rate also become fixed at the blanket removal rate, causing the average thickness to return to a linearly decreasing function. The result is that using SFEs to represent these complex time dependencies of the CMP process is inappropriate. The first approach to using SFEs, where a simple ratio between the blanket removal rate and the patterned removal rate is used, does not account for any change in the slope of these lines. The second method, where the polish time is the sum of a fixed "planarization" time and an additional time based on the blanket removal rate that controls the final thickness, is also incorrect. It assumes that all the densities have reached planarity, which is generally not true. This is particularly bad to assume if the exact thickness profile is not known or if the target removal amount is just at the point when the step heights become planarized. These factors combine to result in the average thickness for the different devices being either poorly controlled or controlled to an incorrect thickness. While the blanket wafer 91 performance metrics are closely monitored and controlled, the problems in the control of patterned wafers continue to persist. In the next section, we present a control framework that addresses these problems for both single and multiple device control. 3.5 The Multiple Device Control Problem for Dielectric CMP The previous sections demonstrated that controlling pattern dependencies in dielectric CMP, particularly in the case of multiple device control, is a difficult problem. The pattern dependencies are the largest source of variation in dielectric CMP, and current control techniques do not address this. Each device may have a different average removal rate, and typical practice generally requires using test wafers and device dependent removal rate predictions calculated using inaccurate relationships to the blanket rate to determine the polish time after the tool has been idle or after a device has changed. These control approaches do not take into account where on the device they are measuring, nor can they estimate the range of the thickness variation across the wafer or within the die in order to ensure that the underlying layer is not exposed. As a result, these strategies decrease performance, waste wafers and chemicals, decrease throughput, and increase complexity. We will formalize the multiple device control problem in this section, and present a solution to this problem in the next section. There are many metrics over which we could perform control. For example, we could control the average removal rate, the average post-polish thickness, the within-die variation, the within-wafer variation, the step height at certain locations, and others. With each added component, the complexity of the control strategy increases. Recall that the within- 92 wafer variability does not vary significantly over the life of a typical CMP pad. Other works have shown that the within-die variability is largely dependent on the process conditions [56]. We ran a design of experiments varying table speed and down force. The surface profile of the within-die variation is shown in Figure 3.18. Here we see that increasing table speed and decreasing down force will reduce the within-die variation, but that no optimum is achieved within the range of the tool settings, i.e. there is no "sweet" spot. Therefore, a control strategy aimed at trying to maintain an optimal within-die variation by adjusting the tool settings would be inappropriate. In addition, we ran several test wafers during the 600 wafer experiment described in Chapter 2. The within-die variation as a function of the life of the CMP pad is shown in Figure 3.19. Here we see that there is little indication that there are long term dynamics of the within-wafer variation. This also suggests that a control technique aimed at controlling the within-die variation would only increase the wafer-to-wafer variability in the process by responding to noise in the mea- 1600 e 1500 i 14001 - S1300 B 12001- :E 1100, S 1000>-10 0 8 - Down Force 6 4 -- 80 (psi) 6 0 20 Table Speed (RPM) Figure 3.18: Within die variation (standard deviation of the post-polish thickness) shown for a design of experiments that varied the table speed and down force over a wide range for the dielectric CMP process. 93 I %JIj 1700- c 0 1600- Ir" cc 15001 .S elloo 1400 1300 UU0 100 200 300 400 500 600 Wafer # Figure 3.19: Within-die variation shown over the polishing of 600 wafers. The stars are the within-die variation measured on four dies on each wafer, and the solid line is the average of the four die from eight wafers over the 600 wafer run. surement of the metric. These issues suggest that an initial control strategy should focus on monitoring these sources of variation, rather than controlling them. Many approaches to dielectric CMP control also monitor the step height reduction. However, now that we understand the effect of pattern dependencies in CMP, we are in a position to better understand step height removal. As shown in Figure 3.20, the step height Figure 3.20: Typical structures used for step height measurement. 94 was measured across bond pad structures for the test device that was used for control in Chapter 2, and whose surface profile is shown in Figure 3.2. The step height measured at this location is shown in Figure 3.21. The wafers were polished at polish times such that there was equivalent amount of material removed between the three processes which had different polishing rates. Therefore, the values are plotted against the equivalent amount of dielectric material removed from the polishing of the blanket wafers at similar polishing times. This allows a fair comparison of the step heights removed for the different processes. Here we see that, in all three processes the step height of this feature quickly decreases. Note that this measurement was taken on the edge of an array of features, and that there is a large open area around this measurement location. This implies that the effective density of this location is low, because the large down area decreases the density where the step height measurement was taken. Now consider a step height measurement taken on the same wafers in a location of higher density, as shown in Figure 3.22. The step 10 4 *103 IM4 C CProcess AM Process B 10 -A Process A 1011 0 1000 2000 3000 4000 5000 6000 Amount Removed (Angs.) Figure 3.21: Step height measurement for a low density feature, for three different processes, plotted against the amount removed on a blanket wafer. 95 Figure 3.22: Higher density structures used for step height measurements. Process A Process B C 10 Process C 102 0 1000 4000 3000 2000 Amount Removed (Angs.) 5000 6000 Figure 3.23: Step height measurement for a low density feature, for three different processes, plotted against the amount removed on a blanket wafer. heights are plotted, as in Figure 3.21, against the amount of blanket removal in Figure 3.23. In this case, the results are completely opposite. In the previous case, Process A had the fastest step removal, and Process C the slowest. Here we see that Process C has the fastest step height removal, and Process A the slowest. The reason is that step height is dependent on where you measure, i.e. it is dependent on the device layout pattern. These step height profiles can be explained using the density model. The step height is essen- 96 tially determined by the removal of the raised features over which the step height is measured. Because the results in Figure 3.21 are from a region of low density, the step heights are removed very quickly. Once the features are nearly planar, the removal rate of the raised areas change to the blanket removal rate and the removal rate of the down areas increase to the blanket rate, drastically slowing the removal of the step height. The difference in the three processes comes from the "effective density" of each process. The process with the larger planarization length has more of the surrounding low density open area averaged in, causing it to have a lower effective density and therefore polish faster. On the other hand, the higher density step height measurements have the opposite effect. A larger planarization length, Process C, averages in more of the lower density features far away, and thus removes the step height faster. In reality, if one is interested in planarization, i.e. removal of the step height, across the entire device, the difference between the step height in the low and high density features should be monitored, as shown in Figure 3.24. Here we see that, similar to the within-die variation profile, e.g. see Figure 3.4, there is a respective rise in the difference of the step heights of the high and low density features followed by a decrease in the step height. We can see a direct comparison of the difference in step height shown in Figure 3.24 and the within-die variation for these same wafers shown in Figure 3.25. We see that the step height is highly correlated to the within-die variation. We also showed that the within-die variation does not have an optimal location within the range of the tool settings and does not vary significantly over the life of the CMP pad, and should not be controlled. Therefore, in terms of controlling the CMP process, step height measurements would serve mainly as a redundant measurement of within-die variation. In addition, these measure- 97 1 Process A 0.9 Process B 0.8 -0.7 Wa0.6 Process C -S0.5 00.4 01 0.2 0.1 1000 3000 4000 2000 Amount Removed (Angs.) 5000 6000 Figure 3.24: The difference in the step height measured at low and high density features, for three different processes, versus the blanket amount removed. 1100 -:1000 0 CD C 900 F C 02 A Process A 800 - S700 - Pro4s Process B PrcsC .C Pe 600 0 1000 4000 3000 2000 Blanket Amount Removed (Angs.) 5000 6000 Figure 3.25: The within-die variation, measured at 25 locations in 10 die, for three different processes, versus the blanket amount removed. ments are generally time-consuming and require significant manual effort to level the traces. Therefore, controlling on step height measurements would only serve to slow down the control process, whereas performing optical thickness measurements using on-line 98 metrology actually provides a speed up in processing (as was shown in Chapter 2). We could also focus our efforts on controlling the average polishing rate of the process, however doing so generally requires us to change the process settings, and as we saw in Figure 3.17, this will have an adverse effect on the within-die variation. And since within-die variation is the largest source of variability in the CMP process, we believe this would not be the best approach. Recall that our goal in the CMP process is to planarize the surface of the wafer, meaning remove all the steps in the thickness, as well as have the overall surface of the wafer as flat as possible, without excessively thinning or breaking through the dielectric material. Since we cannot improve on the within-die variation with the process settings, our focus here will be on controlling the average thickness, and monitoring the total non-uniformity, i.e. the combined within-wafer and within-die variation. In essence, we are stuck with a surface such as that outlined in Figure 3.1, and we are trying to control the level of this surface while trying to monitor the total indicated range of the surface. This information can be used to ensure that the low points on the wafer are above the required minimum thickness or to estimate such things as the delay in the interconnect circuitry. Let us now formalize a goal for the control of the CMP process. Let us assume that we would like to control the average of the thickness at several sites in several dies over the surface of the wafer. Note that we may desire that a large number of such points be included in the average, and are not necessarily limited to the points that will be measured. Let us assume that the thicknesses of these points are given by y where 1 is s, d [n], (3.9) the layout number of the device being run, s is the number of the site within die 99 number d, and n is the wafer number. We would like to control the average of all sites within all dies within each device layout, 1, Nd(l)N,(l) y [n] N NP() NsI ) Nl) s -s1d[n] s,- d= Is= (3.10) 1 to the target average thickness, T1 , for that layout over all runs with wafers of type 1. The mean square error from the target is thus 1S y - n c=l 2 (3.11) n We would like to obtain the minimum of these errors ( Ni MSEopt = min (3.12) MSE over all devices. In addition, we would like to monitor the total non-uniformity of the process Nd(l)Ns(l) NU [n] = (Nd(l) - 1) (Ns(l) - 1) s,I d[n] - y [n]). (3.13) d=ds=1 We will stop the process for re-optimization if the non-uniformity is too large, or NU [n] > ULNU (3.14) for any device layout 1. In order to achieve this goal, we are required to do this with very few measurement sites within only a few die of each wafer. Also, the number of measurement sites for each type of layout may be different. Thus, the controller has at its disposal 100 y Im[n], sm, dM (3.15) where sm is the site number in die number dm of the measurement from the device layout I on run n. In summary, we would like to control the average thickness of the profile shown in Figure 3.1, while monitoring the total range of the profile. In addition, we would like to do this for multiple devices running sequentially on a single tool, using only a small number of measurement sites within a small number of dies from each device. Note that the form of the problem given above does not make any assumptions of the form of the measurements. In particular, the thicknesses of one device may be correlated to the thicknesses of another device, but the form of the problem neither states this, nor suggests a form for any such relationship. This will be entirely dependent on the solution to the control problem. The following section provides a framework which addresses this problem, and provides a strategy for controlling the CMP process which addresses many of the problems outlined in this and previous chapters. 3.6 A Framework for the Control of Multiple Devices in Dielectric CMP A device independent CMP control framework is outlined in Figure 3.26. The framework is centered around the density model described in Section 3.2. In summary, the approach is to generate a virtual post-polish thickness profile of the entire wafer surface using the density model, i.e. using estimates of the blanket removal rate and the planarization length of the CMP process, for various polish times. The polish time that provides a 101 Generate Time IE Polish & Measure 7-IWafer th- Density Model U:pdate Device Files Figure 3.26: A device independent run by run process controller for CMP. virtual wafer whose average is as close to the desired average thickness for the particular device layout being run is used to polish the actual, i.e. real, wafer. Once the actual wafer is polished, a small number of points are measured in a small number of die on the wafer. The parameters of the density model, i.e. the blanket rate and planarization length, are then varied until the thickness predictions of the density model best fit the few measured thicknesses. These new values for the blanket rate and planarization length are then fed back into the model, i.e. they are averaged with past values, and the new parameters are used to generate the polish time for the next wafer to be polished. The details of the actual implementation of this control method and variants of it will be discussed later. This strategy provides several benefits over traditional CMP process control methods. These benefits are the result of the separation made by the control framework as to what is controlled and what is measured. In this strategy, the measurement locations are not assumed to be what is controlled. The model of the CMP process is used to de-couple the 102 interaction between the effects of the pattern layout and the effects of the polishing process. The parameters of the model are used in conjunction with the layout file to predict a much more detailed outlook of the actual wafer, without actually having to measure the entire wafer. The average of this "virtual" wafer is what is actually controlled, not a few measurement points that may have little to do with the true average of the wafer. Thus, the average film thickness of the entire wafer, including the within-wafer and within-die variation, is controlled with only a few measurements. Also, because the virtual wafer is created from the model parameters, we can estimate the within-die and global non-uniformity from the virtual wafer. Another benefit of this approach is that the device layouts are used to remove the layout dependencies in the measured data to extract the properties of the process, independent of the device being polished. These properties of the process are tracked using the parameters of the density model, i.e. the blanket removal rate and the planarization length. Because the same parameters can be used for any device, devices may be interchanged at any time without running a blanket wafer or a test wafer of that device. In addition, it allows a device to be accurately controlled, even if it has not been recently run. 3.6.1 A Device Independent Control Algorithm We now describe the control algorithm outlined above in a more formal manner, explicitly describing the steps in the control algorithm. In order to predict the virtual profile of the post-polish thickness of the entire wafer, a large grid of discrete set of sites on several dies must be set up, and the density model used to predict the post-polish thicknesses at these points. In order to do this, the polish time and estimates of the blanket rate 103 and planarization length for each of the dies to be predicted must be input into the density model. Specifically, we have estimates of the thicknesses given by Kd[n] - t[n] y s, d zo[n] - [n] t< t pex(s), y(s), Ld[n], D( 1 )) zO[ n] - z I [n] - Kd[n] (t[n] - t ) (3.16) t > tl where tI n] pf z = y(s), Ld[n], D(1)) Kx(s), K(d) (3.17) Kd[n] and Ld[n] are the blanket rate and planarization length for die number d on run n, t[n] s, is the polish time used on run n, x(s) and y(s) are the x and y locations of site number zo[n] and z1 [n] are the initial oxide thickness and initial step height for the wafer on run n. The effective density is calculated as Pefx(s), y(s), Ld[n], D(l)) = po(x, y, D(1)) - w(r, Ld[n]) (3.18) xe X ye Y where r= (x -x(s)) 2+(y -y(s))2 (3.19) and where x and Y are the set of all points within a finite pre-specified distance from the point of interest, {x(s), y(s) }, in the device layout D(1) for layout number 1, p0 (x, y, D(l)) is the feature density of each of these points, and described by equation (3.6). The polish time for run n can be found as 104 w(r, Ld[n]) is the elliptic weighting function Nd(l)Ns(l) t[n] = argmin Nd(l) Ns(l) (3.20) s, d [n]d= Is= 1 where, as stated in the definition of our control problem in the previous section, T is the target average thickness for the device layout number 1. Once the optimal polish time for the incoming device is determined, then we can predict the average post-polish thickness profile of the given wafer as Nd(l)Ns(l) y [n] 1 = s s d = Is= (3.21) [n] 1 where the predicted thicknesses are given by (3.16) using a polish time given by (3.20). In addition, we can predict the global non-uniformity (variance) of the entire wafer using Nd(l)N,(l) NU [n] 2 [n] -y[n]. d = Is = (3.22) 1 After polishing, measurements of the post-polish thicknesses are taken at a few locations on a few dies across the wafer; let these measurements be given by y Sm, dm [n], (3.23) where I is the layout number of the device being run, sm is the number of the measured site within die number dm, and n is the wafer number. Note that this set of points may be a subset or a completely different set of locations than the large set of points specified by s and d in the virtual post-polish thickness wafer profile outlined above. These measurements are then used to extract a new blanket rate and planarization length for each mea- sured die by finding the parameters that minimize the squared-error fit between the 105 measured data points and the model estimates for those points: r (Kd[n],dn] { dI~n} Ndm( 1) Ns,(1) ==Lrmi argminNdm(l) Nsm(1) Y sm, dm dm = Ism = [n]-y sm, dm [n] )2 (3.24) 1 where the estimated thicknesses 1 sm, dm (3.25) [n] are computed in the same fashion as the points in the profile of the virtual post-polish thickness profile outlined in (3.16), but with the set of measured sites sm in the measured die dm. These new values for Kd[n] and Ld[n] are used to update the estimates of the blanket rate and planarization length for each die. This is done by performing an exponentially weighted moving average of the blanket rate parameter for each die Kd[n+l] = Kd[n] (3.26) d[n] +(I -wL) - Ld[n], (3.27) wK Kd[n] +(1-wK)- and the planarization length parameter for each die Ld[n+ 1] = wL where wK and wL are the EMWA weights for the moving average of the blanket rate and planarization length of each die, respectively. In theory, these could actually be different weights for each die. However, we will assume that the dynamics of the blanket rate and planarization length are the same for all dies on the wafer. This may turn out to be insufficient if some die appear to change more than others. For example, the die nearest the flat or notch of the wafer may have such a property. On the next run, the controller uses the device layout of whatever type of device is being run with these updated estimates of the blanket rate and planarization length for 106 each die to create new virtual post-polish thickness wafer profiles in order to optimize the polish time for that device, and the cycle repeats. When new values of these model parameters are extracted from the measured values in equation (3.24), only a few measurements are necessary. Because the model explicitly relates each measurement to its effective density determined using the model and the device layout, each measurement is compared with a prediction from the model, and not to some arbitrary target that does not relate to polishing of that particular site. In varying the parameters of the model, the predicted profile is aligned with the measurement sites, allowing the model parameters to be updated independent of the particular device being run. We can see from equations (3.16) through (3.19) that the value of the thickness at any point is determined by a few key parameters: the blanket removal rate of the die, the planarization length of the die, the device layout, and the polish time. Given the blanket removal rate and planarization length, we can use the model to generate a very densely sampled profile of the device being run. This densely sampled profile allows us to generate a polish time for the next wafer to be polished, i.e. equation (3.20), that controls the true average thickness, i.e. equation (3.21), to as close to the target value as possible. In addition, this thickness profile prediction and the controlled average thickness are independent of the locations that we actually measure, i.e. equation (3.23). The densely sampled predicted thickness profile also allows us to estimate the total, or global, non-uniformity, i.e. equation (3.22). After polishing, measurements of the post-polish thicknesses are taken at only a few sites on a few dies across the wafer, i.e equation (3.23). The model is then used to extract a 107 new blanket rate and planarization length for each of the measured die, i.e. equation (3.24). This is done by varying the parameters of the model, the blanket rate and planarization length, in order to best fit the measured data. These new values are used to update the estimates of the blanket rate and planarization length for each die via equations (3.26) and (3.27). The use of the pattern layout in the model de-couples the effects of the device layout from the effects of the process, i.e. the blanket rate and planarization length. It is this independence of the device layouts that allows us carry over the state of the process from wafer-to-wafer, regardless of the type of device being polished. Specifically, by simply changing the device layout and using these parameters in the model, we can predict the post-polish thickness profile of any new device and interchange devices at any time. 3.6.2 Further Discussion of the Device Independent Control Algorithm While the algorithm in the previous section is fairly general, in that it defines methods for tracking both the blanket rate and planarization length for all die being measured, we would like to address whether or not this is really necessary. In particular, it may be the case that we wish to only track these parameters for one die, or an average over the measured die in order to keep the strategy and updating more simple. It may also be the case that we wish to track only the blanket rate and use a constant planarization length, because the re-calculation of the planarization length in the extraction step is time consuming. Recall that the blanket wafer removal rate varies over the surface of the wafer (e.g. see Figure 1.8). Thus, we will definitely need to estimate the blanket rate for each die that we wish to monitor. In actuality, monitoring the actual position rather than simply the radial distance may be somewhat unnecessary, since these radial non-uniformities are often due 108 to imperfections in the wafer carrier and the wafer has a tendency to slip underneath the carrier during polishing. However, the blanket rate near the wafer flat or notch may be different than elsewhere on the wafer due to the asymmetry. Therefore, we monitor a separate blanket rate for each die in this work. In order to determine if we need to monitor and update the planarization length for each die, or even monitor and update this parameter at all, several characterization test masks were run during the 600 wafer control experiment outlined in Chapter 2. From these characterization masks, the planarization length was extracted for four die on eight wafers spread across the 600 wafer experiment. The results are shown in Figure 3.27. Here we can see that the average of these values is fairly constant over the life of the pad. However, notice that die one appears to be continually increasing, while dies three and four appear to be continually decreasing. This suggests that, over the long-haul, the planarization length of each die will also need to be updated. Therefore, the first implementation of the controller updates both the blanket rate and planarization length. We now turn to pre- 000 E E 5500- 50000 4500 - 0 CC 4000- 3500C. 300 20 40 60 80 100 Lot # Figure 3.27: The average planarization length over the course of 100 six wafer lots (solid line), and calculated planarization lengths for each of the four die on each of these eight wafers (dots). 109 senting our experimental implementations of this type of controller. 3.7 Experimental Results This control framework was tested on an IPEC 472 polisher at Texas Instruments, Inc. The two devices shown in Figure 3.28 were polished alternately every other run, with the objective being to control the true average thickness of these devices to the same target value of 8000A. For these particular devices, this is a challenging task for two reasons. One is that both devices have large regions of very high density and large regions of very low density, which results in a large range of thicknesses that are difficult for any model to accurately predict. The second is that they have different average values, making the average removal rates highly device dependent. The devices were polished alternately so as to provide the most difficult situation for control. The alternating of the devices causes the largest change in the polishing rate on each run. If there are any device dependencies in the modeling or control strategy, they will appear as systematic errors that alternate every other run with the device. Device 1 Device 2 Figure 3.28: Test devices being controlled with the device independent controller. 110 3.7.1 Updating Both Planarization Length and Blanket Removal Rate In our first control experiment, all measurements were performed on an ex-situ KLA/ Tencor UV1280. We would like to test our claim that a reduced number of measurement points could be used to control the true thickness. We used a reduced number of sites, 12 sites on four dies, to update the model during the control experiment. In order to determine the polish time and estimate the global non-uniformity, we estimated the wafer post-polish thickness using a more densely sampled plan, 63 sites on four dies. After the control experiment was completed, we remeasured the wafers using this more densely sampled plan, and the true average and the global non-uniformity of the wafers were calculated using these measurements. These were used to determine how well the control of the average thickness was, since the points that are measured during the control run may be very different than the true average that is being controlled. These measurements were also used to determine how well the controller's prediction of the global non-uniformity during the control experiment actually was. The sampling plans within each die for the two devices are shown in Figures 3.29 and 3.30. The points indicated as filled circles were the points used to update the model during the control experiment, while the points indicated by crosses were additional points that were measured following the control experiment in order to calculate the true mean and global non-uniformity. The sampled dies are shown in Figure 3.31. 111 X (mm) Figure 3.29: Measurement plan of Device 1. Circles are points used for control. Crosses and circles are used to determine the true average. X (mm) Figure 3.30: Measurement plan of Device 2. Circles are points used for control. Crosses and circles are used to determine the true average. 112 [II 11111_11111 Figure 3.31: Map of the dies used for the multiple device control. In this first experiment, the EWMA estimates of the planarization length and blanket removal rate were monitored for each of the four die. This resulted in the average thickness being controlled to within 400A of the target, as shown in Figure 3.32. This error is fairly good for multiple device control, keeping in mind that the average lot to lot variability of blanket wafer polishing is on the order of 200A. As indicated earlier, this represents a particularly difficult case. The total non-uniformity of each of the devices being run is shown in Figure 3.33, including the estimated total non-uniformity predicted during the control experiment using the few measurement points and the actual values determined from 63 point measurements taken on the four die following the control experiment. Here we can see that the predictions during the experiment using the few measured points provide an excellent monitor of the post-polish thickness. Although the average thickness was controlled to an acceptable level, there were several problems with the control run. First, since the measurements for control were performed off-line, each control run took one and a half hours (30 minutes to polish the wafers, 30 minutes to clean them, and 30 minutes to measure them). Therefore, only a 113 Wuuu 8500U, U, a)C - - - - - - 8000- U I- 7500- 7000 0 1 a)U 2- G) 0 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8 rr 3- 1 Run # Figure 3.32: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number run. 12000 11000 Maxi mum 10000 E 0- 0 8 000 L Range 7000- K 6000- z Minimum 50004. 4000- 4' % %. .4' 4, I' 30000 1 2 3 4 5 ~g. 6 7 8 Run # Figure 3.33: The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies. small number of runs were performed. Second, the polishing pad was new and still in the 114 break-in period. As a result the blanket rate was rapidly decreasing toward a steady state. The extracted blanket rates shown in Figure 3.34 effectively demonstrates this. This made controlling the process difficult, since the EWMA weights were not optimized for these conditions. Third, the output appears to oscillate with the device. The dependency is also seen in Figure 3.34, where we see oscillations in the extracted planarization lengths (from the update step). This is not good, as the model assumes these are device independent [53,56]. This suggests that either the model is inaccurate, or planarization length is not device independent. At the time these experiments were performed, we were unable to analyze the model or these effects in detail. This analysis was done later, and is presented in Chapter 4. At the time, we decided to assume that the planarization length was indeed a function of the device, and continue our experiments based on this assumption. 3200- E MU 3000- 0 A 2800-4 ) C b 4 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5000 -T N 4000 E 3000 3 2 1 0 0 Run # Figure 3.34: Parameters extracted from the measured data during the first control run. 115 3.7.2 Updating Blanket Removal Rate Only The logical approach to overcoming the device dependencies of the planarization length, based on our new assumption, is to fix the planarization lengths as a function of the device. It is possible that there are some time dependencies of the planarization length, as suggested by Figure 3.27. Since the change in the planarization length, if any, is very slow over a very long period of operation, these changes will not come into play in our relatively short experiment. The focus here is to demonstrate that improved control can be obtained by fixing the planarization length and updating only the blanket rates of each die (which are known to change with time). In order to do this, we generated the optimal planarization lengths for each of the devices using all the points from the previous experiment. A new version of the controller which uses these fixed planarization lengths for each device, updates only the blanket rates based on the measured data. In addition to this modification of the controller, we switched from using an ex-situ metrology tool to an on-line metrology tool. This provided a significant speed-up in terms of the control, and the throughput went from one wafer every one and half hours to roughly one wafer every 15 minutes. Figure 3.35 shows that this resulted in control of the 63 point average to within 200A of the target, significantly better than control typically achieved when running multiple devices. In addition to the excellent control, we were able to provide accurate estimation of the total non-uniformity during the control run. Figure 3.36 shows these predictions plotted against the actual value determined from the 63 measurements made on each of four dies following the experiment. Again we see that the predictions are highly accurate, and that the total non-uniformity of each of the two devices is accurately predicted. 116 tuuu 8500 (0 (0 0 8000- C.) I- 7500- 7000 2 3-1 4*: 0 U 2- J4 4 6 8 10 12 14 16 18 16 18 414 0 0 2 4 6 10 8 12 14 Run # Figure 3.35: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run. 12000- Maximum 9000 E L. 8000 Range 0 0 z - 5000" Minimum p 4000 3000 20002 2 4 6 10 8 12 14 16 18 Run # Figure 3.36: The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies. Unfortunately, as before, there still appears to be a strong dependence in the controlled 117 average thickness on the alternating devices. This is further seen by the extracted blanket rates shown in Figure 3.37. These oscillating extracted rates indicate that the "blanket" rate actually fluctuates with the device, the values used for the fixed planarization lengths during the experiment are slightly incorrect, or the model is just plain inaccurate and needs to be corrected. Recall that in our model, the blanket rate is a parameter that is based on the removal rate of the blanket sheet film of material. We could assume that this is actually a function of the device, however, this violates the assumptions of the model. On the other hand, it is possible that the planarization lengths are in fact slightly incorrect, since we only had a few wafers from which to determine these values. The values for the planarization length were 3500gm for Device 1 and 4500gm for Device 2. After this second experiment, the planarization lengths were extracted for all four dies of the 15 wafers run. The re-optimized planarization lengths were 3450gm and 4615gm for Devices 1 and 2, respectively. These represent only small percent differences in the planarization length, of which 3200- E3000- a ' 2800 2 4 0 ' EL 6 8 10 12 14 16 18 5000 ) 3000 . . 0 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 Run # Figure 3.37: Measurement plan of Device A. Circles are points used for control and crosses and circles are used to determine the true average. 118 the predicted profiles are known to be relatively insensitive. Therefore, this is most likely not the cause of the errors. Thus, these errors are most likely due to inaccuracies in the assumptions of the model. As a result, work is needed to improve the model in order for this control technique to be most effective. Chapter 4 will explore one possible improvement in the CMP model that may help correct these problems. 3.7.3 Correcting for Device Dependencies in the Blanket Rate Although further improvements in the model are necessary in order to make this control solution truly device independent, we still wished to determine if this type of controller could be made to work even better in spite of the inaccuracies in the model. A second variation assumes that the error can be further corrected using an adjustment to the blanket rate, along with the fixed planarization lengths, for each device Kd[n] = KJI[]+Ad[n] (3.28) where Ad[n+l] = wK -(kd[n] - K1[]) + (1 -WK) - Ad[n] . (3.29) A device dependent expression for the blanket rate, Kd[n], is considered which is the sum of a blanket rate due to the device, K 1[0], and an error term, Ad[n]. The error term is updated using the exponentially weighted moving average in equation (3.29). Equations (3.28) and (3.29) replace equation (3.26) in the original form of the controller. The adjustments due to the device dependencies were determined from the experimental data from the previous control run. As shown in Fig. 3.38, this results in control of the average postpolish thickness for multiple devices to within 100 A, and removes nearly all device dependencies in the controlled thicknesses. This level of control is far better than typically 119 8500oo0 U) 8000C Jk 7500 - 7000L 0) 4) a 5 10 15 20 25 - ^^n 23n 10 5 15 20 25 Run # Figure 3.38: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run. 1100010000- Maximum 9000I- 0 8000- Range 7000- 0 z 6000- Minimum 5000- 02 400030002000 - 0 5 10 15 20 25 Run # Figure 3.39: The minimum, maximum, and range of the polished devices. The dashed lines represent the predicted values using the model, while the solid lines indicate the values determined from the 63 point measurements on four dies. achieved using conventional control techniques for switching multiple devices, and is at the quality level necessary for next generation devices. In addition to this, the estimated 120 and actual total non-uniformity are outlined in Figure 3.39. Once again, we see excellent prediction of the total non-uniformity across the wafer. This demonstrates that many of the test wafers and pilot runs used to control and monitor the CMP process could be removed, because the total non-uniformity can be extracted from the few points measured on each run. One final point we would like to stress concerns the difference between the measured values and the complete profile of the device being controlled. Because we controlled the true average of a densely sampled profile of the wafer while measuring only a few select points, we would like to demonstrate that this average is quite different than the average of the measured points. Figure 3.40 shows the resulting control of the 252 point averages (63 points on four dies) along with the 24 point (6 sites measured on four dies) average measured for Device 1, and the 48 point (12 sites on four dies) average measured for Device 2. Here we see that these average values are very different than the controlled 252 point averages. 11000 10500- Measured Device A Average __10000- . 9500- True Device A 9000 - Average o 8500- True Device B Average 80007500 7000 - Measured Device B Average 5 ' 15 10 2002 25 Run # Figure 3.40: Difference in the average of the measured sites and the average of the controlled sites (the "true" average). 121 3.8 Summary We have presented a device independent controller that simplifies processing and significantly improves lot to lot processing quality with multiple devices. We demonstrated that effective control in this scenario can be obtained using the MIT density model that correlates the value being measured with the true profile of the controlled thickness. This provides several benefits. First, it allows measurements from any device to be used to update the tool level model that can be used with any other device being processed. Second, it allows us to measure only a few points, while very accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of the polishing process for each device being processed. This eliminates the need for many test wafers aimed at determining uniformity. Our first experiment demonstrated that this type of control results in a lot to lot variability (400A) which is fairly good relative to existing techniques for controlling multiple devices. Unexpectedly, the planarization length of the density model was found to be a function of the device. This is unfortunate because we no longer have a truly device independent model. This indicated that an improved model is necessary to achieve truly device independence control. In our second experiment, we fixed the planarization lengths for the different devices so that we could improve the control with the existing model. This resulted in significantly better control (200A). The best results (100A) were obtained with a device dependent model update strategy, which included adjustments to the blanket removal rates. In all cases, we were able to very accurately predict the total non-uniformity of the polished wafers. In the next chapter, we will explore one possible improvement to the model, in the hope that its use will remove these dependencies. 122 123 124 Chapter 4 A Dielectric CMP Model Combining Density and Step Height Dependencies In this chapter, we turn to trying to understand the device dependencies in the model used for control. While the model used in Chapter 3 provided a first order prediction of the post-polish dielectric film thickness, it was suggested that this model is insufficient for providing truly device independent and highly accurate control of the average thickness. This tight control of the average post-polish film thickness will become critical as specifications on device properties continue to rapidly tighten as device sizes continue to shrink. In order to achieve this tight control, a controller such as that outlined in Chapter 3 will play an important role. Therefore, it is important that we develop a model which can more accurately predict the entire profile of a device, independent of the device being run. In this chapter, we will explore the use of a model similar to that used in Chapter 3, but with the addition of a step height dependence. Several works have proposed models for the chemical-mechanical polishing of interlevel dielectrics; each of which provide various benefits. We will not review all the models in this thesis; other works have provided a complete review [54,56]. Our approach is to consider those models which could be combined with the MIT density model, which has the ability to efficiently predict the post-polish thickness profile for the entire die of an arbitrary layout [53]. This is critically important if we are to utilize a model for performing the device independent process control outlined in Chapter 3, for determining dummy fill 125 [56], or for estimating circuit performance [57]. However, the density model provides thickness predictions to only a first order, and falls short when predicting low density features. Burke proposed in [58] that the step height decreases exponentially with time. Tseng et al. proposed that removal rates of raised and down areas converge exponentially to the removal rate of an unpatterned dielectric sheet film (blanket removal rate) as polish time increases [46]. However, both these models lack a clear connection to density. In addition, the model in [46] assumes the pad is always in contact with both the raised and down areas, and suggests that the removal rate step height dependence is determined by the distribution of pressure between the raised and down areas. Grillaert et al., from IMEC, provided experimental data in [47] which demonstrates that these claims are true only after a certain step height is reached. The IMEC model suggests that before this "transition" step height is reached, the removal rate of the raised areas is characterized by the blanket rate divided by the density [47]. After the transition step height is reached, the removal rate profile is the exponential model outlined in [46]. The IMEC model also suggests that the transition step height is dependent on feature density, but it is unclear how these transition step heights can be determined a priorifor arbitrary layouts. Therefore, it is not clear that this technique would work well on typical patterned wafers, where the layout of the features is complex and their densities are not easily calculated. In this chapter, we will expand the MIT density model to include the step height dependencies of the removal rate suggested by the IMEC model. The purpose is to, like the density model, provide predictions for arbitrary layouts, but improve the fitting of low density features. We will focus on comparisons of variations of this model with the density model, and the effects these comparisons have on our understanding of the mechanisms in 126 dielectric CMP polishing. Section 4.1 briefly reviews the density model and step height dependent models. An analysis of the density model fit to experimental data is given in Section 4.2. Section 4.3 outlines a combined density and step height dependent model, and makes comparisons of this model to the density model. Section 4.4 presents two variations of this time-density model which simplify the model form and solution. We return to our original question in Section 4.5, and discuss whether the device dependencies are removed by the combined density and step height dependent model. 4.1 Density and Step Height Dependent Models As we saw in Chapter 3, the MIT density model provides a first-order approximation of the post-polish dielectric thicknesses for arbitrary layouts [53]. As shown in Figure 4.1, this model assumes the polishing rate of a raised area is initially equal to the blanket rate divided by an effective density. The effective density is determined by computing a weighted average of the feature densities within the planarization length window. During this first regime, the model assumes there is no removal in the down areas. Once the step height is assumed to be completely removed, the model assumes that the removal rates of both the raised and down areas equal the blanket rate. This model's key strength is its ability to efficiently predict the thickness of an arbitrary layout to a first order. This benefit comes from the weighted average of the densities within a window, or planarization length. This averaging is necessary because the pressure distribution of force on a particular feature is affected by neighboring features. 127 Patterned Blanket Removal Rate Profile Removal Rate A eff(X Y, L) Effective Density Profile Layout File Figure 4.1: A high-level view of the MIT density model. The model proposed by IMEC shows that the removal rate of the raised areas, and thus the step height reduction, is not linear [47]. As shown in Figure 4.2, they suggest there is an initial linear regime, where the raised area removal rate is equal to the blanket rate divided by the feature density. The removal rate of the down area during this period is zero. After the pad contacts the down area, this first regime is followed by a period where the removal rate of the raised area exponentially decreases to the blanket rate. During this second regime, the removal rate of the down area exponentially increases from zero to the blanket rate. Typical plots, as well as the expressions for the removal rates in the raised and down areas as a function of polish time, are shown in Figure 4.2. Here K is the blanket 128 C E 3X10 (R)/p BR +(I -p)-e S 20 40 60 80 0 1C IX. 0 0 0 400 0- 200 0- 0 - BR-p-e 20 40 60 80 1i0 Polish Time (Seconds) Figure 4.2: The removal rates of the raised and down areas using the IMEC step height dependent model. removal rate, ho is the initial step height, p is the feature density, r is the exponential time constant, t, is the polish time, t, is the time of contact with the down area, and hi = -K is the transition or contact step height. The work in [47] proposes that the step height at which the pad contacts the down areas is a function of the feature density; i.e. the higher the density the smaller the contact step height. We will explore this in more detail later on. Before we continue, we would like to consider the differences between these models, highlighted in Figure 4.3. Here the density model predictions are placed over the IMEC model predictions. We see that the time at which the density model switches to the blanket removal rate is later than the time at which the IMEC model transitions to the exponential removal rate. The IMEC model suggests that removal of the down area begins before the time suggested by the density model. The pressure distribution is assumed to change 129 c IMEC 1- 0 0 20 40 60 80 100 60 80 100 .E 6000 E 4000 --.. . C13 IMEC 2000 - 0 0 20 20 MIT 40 Polish Time (Seconds) Figure 4.3: Removal rates of the density and step height dependent models for both the MIT density model and the IMEC model. before the step height is completely removed, and the load of the force is shared with the down area. This creates a large difference in the removal rate predictions just after the exponential regime begins. We have plotted the percent differences in the amount removed determined from each model in Figure 4.4. Here we see that the predictions are fairly similar at the beginning and end, but there are large differences in the middle. In particular, note that the positive percent difference in the raised area indicates that the MIT model predicts more removal on the raised area, while the negative percent difference in the down areas indicates that the MIT model predicts less removal in the down areas. 130 0 0 20 40 60 80 100 40 60 80 100 300 200 -S 100-- 0 0 - 20 Polish Time (Seconds) Figure 4.4: Percent difference in removal predictions between the density model and the IMEC model. 4.2 Analysis of the MIT Density Model In Chapter 3, we demonstrated that the MIT density model had device dependencies in the model. In addition, it was shown that using adjustments to the blanket rate was one way to remove these dependencies in the controlled thickness. We now consider some experimental data in order to show that the MIT density model needs to incorporate step height dependent removal rates. Later we will determine if such a revised model can remove these device dependencies for use in process control. Wafers were patterned with an MIT CMP test mask, deposited with a 16800A oxide layer, and polished using an IC1000/SUBAIV pad stack with a standard process on a rotary polish tool at Texas Instru- ments, Inc. The wafers had 20mm by 20mm dies, patterned out to the edge. As shown in Figure 4.5, each die contained five rows and five columns of 4mm blocks with lines of 131 Gradual Density Region |jl3@ X (mm) Figure 4.5: Description of the pattern features in the test mask used for model comparisons. varying pitch and density. The post-polish dielectric thickness data, as well as the density model predictions are shown in Figures 4.6 and 4.7 for the raised and down areas, respectively. In Figure 4.6 the low density regions correspond to the lower thickness values, i.e. points A, B, and D are low density and points B and C are medium-low. We see that the predictions of the removal in the down areas are fairly accurate for the high and medium density regions, yet fairly poor for the medium-low density regions. The predictions are accurate in the high density regions because the removal rate of these features is low, and the step height is still too large for the pad to touch the down area. Therefore, the high and medium density features are at the beginning of the profiles shown in Figures 4.3 and 4.4. Note that the removal in the low and medium-low density regions is over estimated. We can see in Figure 4.7 that at these same locations, the down area removal in these regions is under-predicted. This is the same characteristic as suggested by the difference of the density model and the IMEC model in Figure 4.4. Therefore, the failure of the model to 132 Gradual Density Region Gradual Density Region Gradual Pitch Step Density Region Region 12000 11000 10000 :E F- 9000 M 0 8000 7000 E 6000 RMSE =22oA V 10 0 30 20 40 50 60 Site # Figure 4.6: Measured and modeled values for the post-polish thickness of the raised areas using the MIT density model (dashed line is the model fit). 1.7 1 1.68 1.66 IF. 1.64 1.62 - --C E -I 1.6 - 0 1.58 0 1.56 B 1.54 -A 1.52 1.5 L0 D 10 20 RMSE=244A 30 40 50 60 Site # Figure 4.7: Measured and modeled values for the post-polish thickness of the down areas using the MIT density model (dashed line is the model fit). accurately predict the removal in locations B, C, E, and F is most likely because the pad 133 has touched the down area before the time predicted by the density model. This causes an increase in the removal of the down area and a decrease in the removal of the raised area. On the other hand, the removal in locations A and D is over estimated in both the raised and down areas. The inaccuracy of the density model in these locations is not explained by the early contact of the IMEC model. We will return to this issue later. 4.3 A Combined Density and Step Height Model We will now incorporate the step height dependent model into the density model to capture the benefit of modeling arbitrary layouts, while improving performance with step height dependencies. We begin by integrating the expressions in Figure 4.2 to obtain the amount removed in the raised areas tPK/ p AR tP < tc (4.1) h " tcK/p +K(t - tc)+01- p)" I1- t > tc e (Pt') and the amount removed in the down areas 0 h, ARd = K(tP-tc)- p tP < tc (t -t,)/t I -e (4.2) ) > tc We then assume that the feature density, p, can be replaced by an effective density, as in the density model. We are then left with the challenge of using the effective densities and these equations to explain the experimental data (pre- and post-polish measurements for raised and down areas) from an arbitrary layout. We will outline three methods for doing this. In each of these methods, we need to find K, t, the tc for each measurement site, and the effective density of each measurement site. As in the density model, we assume the 134 effective density is determined by calculating the average density within a window, and that the window size is determined by a single planarization length parameter. The first method for determining these parameters picks a planarization length, and calculates the effective density for each measurement site. Using the measurements and effective densities, we perform a multivariate constrained optimization to find K and t, as well as a contact time t, for each measurement site. This process is repeated until the parameters which provide the best fit (i.e. minimum mean squared error) of the model to the experimental data are found. The following constraints are necessary to maintain positive removal rates in both the raised and down areas. t 0, tct! pho K pho p0,ht (4.3) , and K(t -- tc) < p for (Vtj (tc < t < tp)) (, - e -t-,) .(4.4) Using the experimental CMP data that we used for the density model above, this timedensity model was fit to the data. The results are shown in Figures 4.8 and 4.9. The raised area fit of the time-density model is a 50% improvement over the original density model. This improvement is largely in the low density regions, A through F. The down area fit of the time-density model is also 50% better than the original density model. Here we see a significant improvement in the low density region B, in the medium-low density regions C and E, and in the medium density region F. The early removal of the down area material over that of the density model significantly improves the predictions in these regions. Unfortunately, the predictions in the low density regions A and D still have significant error. As we stated in our analysis of the density model, we did not expect the time-density 135 I 'affinn 12000 11000 0 10000 9000 8000 7000 RMSE = 11oA 6000 *--o Actual Model -, 0 10 20 30 so 40 60 Site # Figure 4.8: Measured and modeled values for the post-polish thickness of the raised areas using the time-density model (dashed line is the model fit). 4 1X 0 1.68 1.66- -- E 0 I 1.64 1.62 0 1.61 a C 1.581.56 RMSE =120A- 1.54 1.5:2 1.I 0-* I 8---- Actual ~Model - I', 0 10 20 30 40 50 60 Site # Figure 4.9: Measured and modeled values for the post-polish thickness of the down areas using the time-density model (dashed line is the model fit). model to correct these locations. It is possible that these errors are caused by poor measurement data. However, these profiles are highly reproducible on other data sets, and sim- 136 ilar errors result. Thus, it is more likely that these effects are real. Figures 4.8 and 4.9 indicate that the time-density model is unable to predict enough removal in the raised areas of the low density regions without over-estimating the removal in the down areas. It is possible that the macro-structure of the pad is having an additional effect not captured by the simple "pad contact" model. This suggests that more work is necessary to understand these effects. 4.4 Variations of the Time-Density Model The previous method has a few problems. First, the large number of parameters (three plus the number of measurement sites) causes the determination of the model to be computationally intensive. Second, having a variable contact time for every site may cause over-fitting of the data. Thus we may be able to fit the data, but not be able to predict the thicknesses on other data sets. Third, these variable contact times make it difficult to predict post-polish thicknesses for arbitrary layouts. Finally, the optimal contact times result in contact step heights that have a functional dependence on density which conflicts with the findings of [47]. Figure 4.10 shows the fitted contact times determined from the optimization of the time-density model, plotted against effective density. The contact times above 40% density are plotted at the time of polish, meaning these features have not yet contacted the down area. These fitted contact times lead to the contact step heights shown in Figure 4.11. Again, the contact heights beyond 40% are determined by the maximum value of the polish time, and are not meaningful. Results in [47] indicate that the contact step height increases monotonically with decreasing density. However, the contact heights below 40% do not agree with these results. These results, combined with the fact that 63 137 4500 4000 '3500 - *c ,3000 2500 00) 2000 1500 Contact heights from maximum times 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Effective Density Figure 4.10: Model fit of the step height at contact time as a function of the effective feature density. 40 CD 35 a) i .30 E U 25 0 0 20 Contact times 15 are maximum (at polish time) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Effective Density Figure 4.11: Model fit of the contact time as a function of the effective feature density. parameters were used to fit 120 data points, suggests that we are most likely overfitting. In response to these problems, we have devised two variations of this method. These 138 utilize contact step heights which have a functional dependence on either the effective density or the feature line space. The first variation utilizes the functional dependence h] = ki + k 2e-'I8, where h, is the contact step height, and k, , k2 , and 8 are variable con- stants. This relationship restricts the contact step height to exponentially decrease with increasing density. This reduces the number of parameters in the previous method from 3+N to six. In addition, the relationship of contact step height on density can be re-used for model prediction on other arbitrary devices. The results of the model fit for this variation are shown in Figures 4.12 and 4.13. Here we can see that this model also works quite well. There is a slight decrease in the quality of fit in the raised areas. This suggests that there is indeed a strong correlation of the contact height to density. The optimal contact step height dependence on density using this functional form is shown in Figure 4.14. The functional dependence of the contact step height in this case is very different from that determined with the variable contact times above. However, this dependence on density agrees with that suggested in [47]. This suggests that the fit from the previous method was most likely over-fitting. Figure 4.15 shows the model fit errors for both the raised and down areas. We can see from this figure that there appears to be larger errors around the 50% density region. The last 15 data points in Figures 4.11 and 4.12 are all 50% density lines with pitch varying from 25 to 250 Rm. These errors indicate that there may be a functional dependence of the contact step height on pitch or line space. Therefore, hi = ki + k 2 l + k 3 the 2+ k 4 14 , second utilizes variation the functional dependence where 1 is the feature line space, and k, , k 2 , k3 , and k 4 are vari- 139 12000 0- - 11000- U) 10000 - 9000U) 80007000- RMSE = 137A 6000[ 0- U- - K 5000 0 10 20 30 Actual Model 60 50 40 Site # Figure 4.12: Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of density. X 104 1.7[- 1.68 UA 1.66 ,1.64 U) 0E 1.62 (4_ cc - 1.6 - VY - - 0 1.58 1.56 1.54 - RMVSE =115A~ -*e--oeActual - - 0 Mode 1.52 1.5 0 10 20 30 40 so 60 Site # Figure 4.13: Measured and modeled values for the post-polish thickness of the down areas using time-density model with contact height as a function of density. able constants. This reduces the number of parameters in the original method from 3+N to seven. Again, this relationship of contact step height on line space could be used for model 140 7000 6000 5000 - 4000- 3000cc 2000- - 1000- 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Effective Density Figure 4.14: Model fit for contact step height as a function of the effective feature density. 500 0 L. h.. 0 -500 0 L- 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 500I 0- 0 a 1 -5000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Effective Density Figure 4.15: Model errors for raised and down areas as a function of the effective feature density. prediction on arbitrary devices. The results of the model fit for this variation are shown in Figures 4.16 and 4.17 Here we see that the line space dependence does improve the fit of the raised areas. Unfortunately, the dependence on line space shown in Figure 4.18 sug- 141 13000 12000 0- 1 1000 W 0 .10000k 0U 0 :E 9000 8000 7000 RMSE = 121A 6000 *--. Actual Model 0 30 20 10 40 50 60 Site # Figure 4.16: Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of line space. 1 .7:x 10 1.68 1.660 0 0 0 0 1.64- 0 1.6- I- I - - 1.62- 0 0 a- 1.58 0 1.54 0 1# - I - 1.56 RMSE =111A -i 1.52 1.5'0 --_J_ A 10 30 20 40 Model 50 60 Site # Figure 4.17: Measured and modeled values for the post-polish thickness of the down areas using time-density model with contact height as a function of line space. gests that the contact step height has a minimum which is not at a line space of zero. Intuitively, we would expect that the pad would be more able to reach down into larger line 142 spaces, giving larger contact heights as line space increases. Therefore, the variation in contact height with line space shown in Figure 4.18 should be treated cautiously, and may be due to a confounding with density or other effects. Other test masks may be necessary to separate these effects. ,ar.nn 3000 2500 2000 1500 0 (L 1000 500 0 50 100 150 200 250 Line Space (gm) Figure 4.18: Model fit for step height at contact time as a function of line space. 4.5 Polish Time and Device Dependencies Now that we have developed a new model, we would like to determine whether we can use it for the control of multiple devices. In order to do this, we need to determine the answer to two questions. First, is this model able to fit to data over a range of polishing times? Second, does this model remove the device dependencies seen earlier in the control experiments? Recall that we plotted the evolution of the MIT density model fits for various polish times (along with the respective experimental data) in Figure 3.12. This model was run 143 1 Seconds I- 4 41 Seconds *WW 490~ 9 Seconds is 1000600 E 4000 2000 ' 10 20 30 40 50 60 Site # Figure 4.19: Measured and modeled values for the post-polish thickness of the raised areas using time-density model with contact height as a function of density for various polish times. Dashed lines are model fits and stars are experimental data points. using similar data, and the resulting predictions for the raised areas using this model are shown in Figure 4.19. Upon close comparison of these fits and those in Figure 3.12, we find this model fits significantly better in the low density regions surrounded by higher densities, particularly regions A and D. Recall that there were two problems with the device dependencies from our control experiments; the extracted planarization lengths were different for the different devices, and, once the lengths were fixed, there were device dependencies in the extracted blanket rates. Therefore, we would like to test these dependencies using this model, with the same data collected from the polishing of the wafers during the control experiments. Using the data from wafers polished during the final control experiment, we extracted the planarization lengths using the time-density model. The planarization lengths are plotted against the 144 Al UllI I E Th3500C) 0 0 N 3000- Ce 25001 0 s 0.5 1.5 1 2 2.5 3 Device # Figure 4.20: The planarization length as a function of the device number being run (using the experimental data from the third control run in Chapter 3). device in Figure 4.20. Here we see that there is still a device dependency in the planarization length. Therefore, the time-density model is also most likely not a device independent model. It is possible that this model could still be used with the device dependent planarization lengths, similar to that done in Chapter 3. In order to test this idea, we again fixed the planarization lengths at the above values and extracted the remaining model parameters from the experimental data from the third control experiment in Chapter 3. These parameters included the blanket rates, as well as the parameters for the contact height functional dependence on density. The extracted blanket rates are shown in Figure 4.21, as a function of the device being run. We see that, although much less, there is still a minor device dependency. The other parameters show similar dependencies. This suggests that improved control may be obtained using this controller, but more improvements in the CMP model are necessary to remove all the dependencies. 145 AMII III I 3500- Ad r 3000 - 2500 0.5 1 1.5 2 2.5 Device # Figure 4.21: The blanket rate as a function of the device number being run (using the experimental data from the third control run in Chapter 3) 4.6 Summary The density model was shown to be insufficient to completely characterize the removal in medium to low density features in dielectric polishing. Differences in the density model and the step height dependent model explain errors in the fitting of the density model, and a combined model was shown to provide up to a 50% improvement in fitting errors of both raised and down area thicknesses. Variations of this model significantly reduce the number of model parameters and provide the ability to predict post-polish thicknesses for arbitrary layouts. Although this model provides improved predictions for multiple polish times over the evolution of the polishing process, device dependencies still exist in the improved model. Future work is necessary to understand the contact height dependencies on density, line space, and pitch. In addition, further improvements are needed to remove the device dependencies in the model. Recent work suggests that long range interactions across sev- 146 eral devices create a pad flexing limit, which may possibly remove such device dependencies in the model. 147 148 Chapter 5 Conclusions and Future Work Chemical-mechanical polishing has become a critical process in the manufacture of integrated circuits. However, due to the many complexities of CMP, controlling the process in a production setting is challenging. Goals on various metrics must be achieved, including: removal rate, within-wafer non-uniformity, step-height, total indicated range, within-die non-uniformity, and wafer-to-wafer non-uniformity. In addition, a number of difficulties exist in controlling the CMP process, including the drift in the polish characteristics of blanket and patterned wafer performance metrics over time, and the challenge of control given a mix of device types on the same tool. Major issues involved with an implementation of a run by run control system for use in a production environment include: quality, cost, flexibility, and ease of use. Chapter 2 demonstrated that the frequent measurements provided by integrated metrology, combined with proper controller tuning, result in high quality control of the average post-polish thickness measured at a few locations within several die of a single device. The automatic measurement of the post-polish wafers and the relatively simple control algorithm provide a high degree of ease of use. On-line metrology was demonstrated to be an effective means for providing measurement feedback to the controller. The variability of the on-line CMP metrology tool is well below CMP requirements, and measurements from this tool correlate well with measurements from ex-situ metrology tools. A 600 wafer experiment verified this and demonstrated that the on-line metrology tool also has good reliability. The 149 simplification of processing using on-line metrology and the reduction in thickness variation from improved control result in a reduced cost for the CMP process. We showed increases in throughput of up to 80% led to reductions in cost of ownership of up to 32%. This approach also benefits the environment by reducing water, chemical, and power usage by 0% to 66.7%. Finally, we demonstrated that simple EWMA filtering of the average removal rate is sufficient for controlling lot to lot trends in the CMP process. This type of filtering resulted in control of the average post-polish film thickness for a single device with an average error of less than 100A, suggesting that complex filtering techniques are not necessary to control the lot to lot variability in CMP. Such complex filtering approaches would only serve to decrease the ease of use and increase the controller's sensitivity to random process noise. Although the quality of control presented in Chapter 2 was very high, there are several problems with this approach. First, using a reduced number of die for control leads to a shift in the average thickness measured by a larger number of die. In addition, estimates based on this experiment suggest that using traditional approaches to control the average thickness with a small number of measurements may also increase variability due to an increased sensitivity to measurement errors. Finally, estimates show that performing control using pilot wafers and sheet film equivalents would result in a 39% increase in the average error of the controlled average thickness, over performing control directly on patterned wafers. These results indicate that the spatial sampling techniques used in this type of control may produce misleading results. In Chapter 3, we outlined the magnitude of the sources of variation in CMP, including within-wafer, within-die, and lot-to-lot variability. Although blanket wafer metrics are 150 most often those used to monitor and control the CMP process, they are second order effects in terms of the variability of the CMP process. The result is that traditional techniques, like those outlined in Chapter 2, are inappropriate for dealing with the many complex issues involved with controlling the CMP process. The problems with using traditional techniques to control the average post-polish thickness were explained by the MIT density model. The model shows how variations in pattern densities lead to large variations in the post-polish thickness profile. These variations due to pattern densities are why both the value and variability of the controlled average thickness using traditional control techniques depend on the measurement locations. In addition, the model demonstrates that the average thicknesses have non-linear dependencies and pattern layouts, and thus approaches that use linear relationships between the blanket removal rate and the patterned removal rate (i.e. controllers using sheet film equivalents) generally result in a decreased quality of control. Finally, we described how different devices often have large differences in their pattern layouts, that result in large differences in their post-polish thickness profiles. These device dependencies are the source of much confusion in the monitoring and control of the CMP process, and lead to several problems, including: controlling to the incorrect average thickness, not taking device dependencies into account when updating the tool model, and not monitoring or even having a rough idea of the global planarity resulting from the CMP process. We demonstrated that within-die non-uniformity is optimized at a particular process setting, and, since this is the largest source of variability in the oxide CMP processes examined here, this process setting should remain fixed. As a result, approaches which attempt to control the average removal rate or the within-wafer non-uniformity by chang- 151 ing the process settings should not be used. In addition, we demonstrated that step-height is largely a redundant measure of within-die variation, which we suggested should not be controlled. We outlined that an ideal controller would control the average post-polish thickness profile of multiple devices while monitoring the global non-uniformities of the different devices being polished using only a few measurements on the wafer. We then presented a device independent controller that integrates the MIT density model into the control strategy. This controller correlates the value being measured with an estimate of the thickness at its corresponding point in the layout based on the density model. The controller uses differences in the measured thickness and this estimate to update device independent process parameters, i.e. blanket removal rate and planarization length. The controller uses these device independent parameters and the pattern layout for the device being polished to predict a profile of the post-polish thickness across the entire wafer at a very densely sampled grid. The polishing time used to generate this surface is varied until the average thickness of this profile is controlled to the desired average thickness. In addition, this surface is used to predict global non-uniformity of the post-polish thickness profile. By making a separation between what is measured and what is controlled, the control framework provides several benefits. First, it allows measurements from any device to be used to update the tool level model that can be used with any other device being processed. Second, it allows us to measure only a few points, while very accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of the polishing process for each device being processed. This eliminates the need for a large number of test wafers aimed at determining uniformity. Our first experiment demonstrated that this type of control results in a lot to lot vari- 152 ability of 400A. This is believed to be fairly good relative to existing techniques for controlling multiple devices. Unexpectedly, the planarization length of the MIT density model was found to be a function of the device. This is unfortunate because the parameter is not truly device independent, and suggests that an improved model is necessary to achieve truly device independence control. In our second experiment, we fixed the planarization lengths for the different devices so that we could improve the control with the existing model. This resulted in an average error in the controlled thickness of 200A, which is believed to be significantly better than existing methods. Our third experiment demonstrated control of the average post-polish thickness with an average error of only 100A, using a device dependent model update strategy. This strategy included adjustments to the blanket removal rates based on the particular device begin run. In all cases, we were able to accurately predict the total non-uniformity of the polished wafers. While the framework outlined in Chapter 3 provided the ability to accurately control the average thickness of multiple devices and monitor to the global non-uniformity with only a few measurements, some remaining device dependencies are needed in order to provide the greatest flexibility and highest quality. These device dependencies appear to be related to the quality of the model used for control. Chapter 4 explored one possible model extension in order to remove these dependencies. It was shown that the density model for dielectric CMP is not sufficient to completely characterize the removal in medium to low density features. We demonstrated that differences in the density model and the step height dependent model proposed by IMEC suggest a combined model would improve fitting and possibly remove the device dependencies in the controller. We demonstrated that a combined step-density model provides up to a 50% improvement in fitting 153 errors of both raised and down area thicknesses, and improves fitting errors at multiple polishing times over the evolution of the polishing process. Variations of this model significantly reduce the number of model parameters and allow predictions of the post-polish thicknesses for arbitrary layouts. Despite this effort at improving the model, it was shown that device dependencies still exist with the improved model, and further modeling work is necessary to remove the device dependencies in the controller, and thus provide the maximum quality control with the greatest ease of use and flexibility. In conclusion, we outlined how current techniques that fail to take into account the device characteristics as a whole provide ineffective control strategies. These problems can be overcome by incorporating an advanced process model into a controller that separates what is measured from what is controlled. We have provided a framework for device independent control of dielectric chemical-mechanical polishing. This provides accurate control of the true average thickness and effectively monitors the global uniformity of multiple devices being processed on a single tool. In addition, the approach requires only a small number of measurements in order to achieve this. Further work is still necessary to make the model, and thus the control strategy, completely device independent. Understanding the model dependencies on density, line space, and pitch are critical to removing these device dependencies. Recent work suggests that long range interactions across the device create a pad flexing limit, which may possibly remove such device dependencies in the model. 154 155 156 References [1] M. Martinez, "Chemical-mechanical polishing: Route to global planarization," Solid State Tech., p. 26, May 1994. [2] T. Park, T. Tugbawa, J. Yoon, D. Boning, J. Chung, R. Muralidhar, S. Hymes, Y Gotkis, S. Alamgir, R. Walesa, L. Shumway, G. Wu, F. Zhang, R. Kistler, J. Hawkins, "Pattern and Process Dependencies in Copper Damascene Chemical Mechanical Polishing Processes," VLSI Multilevel Interconnect Conference, Santa Clara, CA, June 1998. [3] C. Yu, P C. Fazan, V K. Mathews, and T. T. Doan, "Dishing effects in a chemical mechanical polishing planarization process for advanced trench isolation," Appl. Phys. Lett., vol. 61 no. 11, Sep 1992. [4] A. Hu, X. Zhang, E. Sachs, and P. Renteln, "Application of Run by Run Controller to the Chemical-Mechanical Planarization Process, Part I," IEEE Proc. of the 15th Int. Elect. Manuf Tech. Symp., Oct. 1993. [5] A. Hu, H. Du, S. Wong, P. Renteln, and E. Sachs, "Application of Run by Run Controller to the Chemical-Mechanical Planarization Process, Part II," IEEE Proc. of the 16th Int. Elect. Manuf Tech. Symp., Oct. 1994. [6] R. Jairath and L. Markert, "Metrology and Process Control Issues in Chemical Mechanical Polishing," NIST 1995 Semiconductor CharacterizationWorkshop, Jan. 1995. [7] A. Altman, "Applying Run by Run Process Control to Chemical-Mechanical Polishing of Sub-Micron VLSI: A Technological and Economic Case Study," S.M. Thesis, MIT EECS, May 1995. [8] A. Hu, H.-P. Dun, P Renteln, and E. Sachs, "Sensor Development and Process Control for Chemical-Mechanical Planarization of Multilevel Interconnect Devices," Electrochem. Soc. Meeting, June 1995. [9] J. Moyne, R. Telfeyan, A. Hurwitz and J. Taylor, "A Process-Independent Run-toRun Controller and Its Application to Chemical-Mechanical Planarization," to be presented, Sixth Annual SEMI/IEEE ASMC, Boston, Nov. 1995. [10] D. Boning, W. Moyne, T. Smith, J. Moyne, R. Telfeyan, A. Hurwitz, S. Shellman and J. Taylor, "Run by Run Control of Chemical-Mechanical Polishing," IEEE Trans. on Comp., Pack., and Manuf Technol. - PartC, Vol. 19, pp 307-314, 1996. 157 [11] T. Smith, D. Boning, J. Moyne, A. Hurwitz, and J. Curry, "Compensating for CMP Pad Wear Using Run by Run Feedback Control," Proc. VLSI Mulitlevel Interconnect Conf, Santa Clara, pp. 437, 1996. [12] T.Smith, "Novel Techniques for the Run by Run Process Control of ChemicalMechanical Polishing," S. M. Thesis, MIT EECS, 1996. [13] J. Moyne, J. Curry, D. Eylon, and R. Kipper, Proc. SEMATEC AEC/APCWorkshop IX, pp. 374, 1997. [14] J. Moyne and J. Curry, "A Fully Automated Chemical-Mechanical Polishing Planarization Process," Proc. of 1998 VLSI Multilevel Interconnect Conf., pp. 515517, June 1998. [15] G. Dishon, D. Eylon, M. Finarov, and A. Shulman, "Dielectric CMP Advanced Process Control Based on Integrated Thickness Monitoring," Proc. of 1998 CMPMIC, 1998. [16] A. Ingolfsson and E. Sachs, "Stability and Sensitivity of an EWMA Controller," J. of Quality Technol., Vol. 25, No. 4, pp. 271-287, 1993. [17] S. Butler and J. Stefani, "Supervisory Run-to-Run Control of Polysilicon Gate Etch Using In Situ Ellipsometry," IEEE Trans. on Semi. Manuf, Vol. 7, No. 2, pp. 193201, 1994. [18] J. Stefani, S. Poarch, S. Saxena and P. K. Mozumder, "Advanced Process Control of a CVD Tungsten Reactor," IEEE Trans. on Semi. Manuf , Vol. 9, No. 3, 1996. [19] E. Del Castillo and A. Hurwitz, "Run-to-Run Process Control: Literature Review and Extensions," J.of Quality Technol., Vol. 29, No. 2, pp. 184-196, 1997. [20] T. Smith, J. Stefani, D. Boning, and S. Butler, "Run By Run Advanced Process Control of Metal Sputter Deposition," IEEE Trans. on Semi. Manuf, Vol. 11, No. 2, pp. 276-284, May 1998. [21] J. S. Hunter, "The Exponentially Weighted Moving Average," Journal of Quality Tech., Vol. 18, No. 4, October 1986. [22] G. E. P. Box and T. Kramer, "Statistical Process Control and Automated Process Control - A Discussion," Technometrics, Vol. 34, No.3, pp. 251-267, 1992. [23] S. Crowder, "Design of Exponentially Weighted Moving Average Schemes," Journal of Quality Tech., Vol. 21, No. 3, July 1989. [24] J. Lucas and M. Saccucci, "Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements," Technometrics, Vol. 32, No. 1, Feb. 1990. 158 [25] D. M. Koenig, Control and Analysis of Noisy Processes, Prentice-Hall, Englewood Cliffs, NJ, 1991 [26] S. Crowder and M. Hamilton, "An EWMA for Monitoring a Process Standard Deviation," Journalof Quality Tech., Vol. 24, No. 1, Jan. 1992. [27] E. Sachs, R. Guo, S. Ha and A. Hu, "Tuning a Process While Performing SPC: An Approach Based on the Sequential Design of Experiments," Proc. of IEEE/SEMI ASMC, 1990. [28] E. Sachs, R. Guo, S. Ha and A. Hu, "Process Control System for VLSI Fabrication", IEEE Trans. on Semi. Manuf., Vol. 4, 1991. [29] E. Sachs, A. Hu, and A. Ingolfsson, "Run by Run Process Control: Combining SPC and Feedback Control," IEEE Trans. Semi. Manuf., vol. 8, no. 1, pp. 26-43, Feb. 1995. [30] T. Smith and D. Boning, "A Self-Tuning EWMA Controller Utilizing Artificial Neural Network Function Approximation Techniques," IEEE Trans. on Comp., Pack., and Manuf Technol. PartC, Vol. 20, No. 2, pp. 121-132, April 1997. [31] T. Smith and D. Boning, "Artificial Neural Network Exponentially Weighted Moving Average Controller for Semiconductor Processes," J. Vac. Sci. Technol. A, Vol. 15, No. 3, pp. 1377-1384, May 1997. [32] M. Le, T. Smith, D. Boning, and H. Sawin, "Run to Run Model Based Process Control on a Dual Coil Transformer Coupled Plasm Etcher," 191st Meeting of the ElectrochemicalSociety, pp. 332, May 1997. [33] T. Smith and D. Boning, "Enabling Intermittent, Delayed, and Non-Periodic Data Sampling with Predictor Corrector Control," J. of Vac. Sci. and Technol., in press. [34] E. Del Castillo, "Long Run and Transient Analysis of a Double EWMA Feedback Controller," IIE Trans., in press. [35] R. Guldi, et al., "Process Optimization Tweaking Tool (POTT) and its Application in Controlling Oxidation Thickness," IEEE Trans. on Semi. Manuf., Vol. 2, pp. 54-59, 1989. [36] S. Leang and C. Spanos, "Statistically Based Feedback Control of Photoresist Application," Proc. of IEEE/SEMIASMC, pp. 185-190, 1991. [37] P. Chatterjee and P. Mozumder, Eds., "Special Issue on Microelectronics Manufacturing Science and Technology, Trans. on Semi. Manuf., Vol. 7, No. 2, May 1994. 159 [38] J. Baras and N. Patel, "Designing Response Surface Model Based Run by Run Controllers: A New Approach," IEEE/CMPT Intl. Manuf Technol. Symp., pp. 210217, 1995. [39] X. Wang and R. Mahajan, "Artificial Neural Network Model-Based Run-to-Run Process Controller," IEEE Trans. on Comp., Pack., and Manuf Technol. - Part C, Vol. 19, No. 1, pp. 19-26, 1995. [40] E. Del Castillo, and J. Yeh, "An Adaptive Run-to-Run Optimizing Controller for Linear and Nonlinear Semiconductor Processes," IEEE Trans. on Semi. Manuf., Vol. 11, No. 2, pp. 285-295, 1998. [41] S. Leang, S. Ma, J. Thompson, B. Bombay, C. Spanos, "A Control System for Photolithographic Sequences," Trans. on Semi. Manuf., Vol. 9, No. 2, pp. 191-207, 1996. [42] N. Jakatdar, X. Niu, J. Musacchio, J. Boa, and C. Spanos, "DUV Lithography Control," Proc. of 1998 SEMATECH AEC/APC Symp., pp. 137-148, 1998. [43] National Technology Roadmap for Semiconductors: Semiconductor Industry Association Report, SEMATEC Inc., Austin, TX., 1997. [44] A. Sethuraman, B. Koutny, and C. Kallingal, "A Novel Planarization Method for CMP of Dielectric Layers for ILD and STI Using Slurry Free Process," 7th ISSM, pp. 239-241, 1998. [45] D.P. Goetz, "The Effect of Subpad Construction on Pattern Density Effects for Slurry-Free CMP," CMP-MIC,pp. 234-241, Feb. 1999. [46] E. Tseng, C. Yi, H.C. Chen, "A Mechanical Model for DRAM Dielectric Chemical Mechanical Polishing Process," CMP-MIC, pp. 258-265, Feb.1997. [47] J. Grillaert, M. Meuris, N. Heyley, K. Devriendt, E. Vrancken, M. Heyns, "Modelling Step Height Reduction and Local Removal Rates Based on PadSubstrate Interactions," CMP-MIC, pp. 79-86, Feb. 1998. [48] T. Smith and D. Boning, "A Study of Within-wafer Non-uniformity Metrics," 4th Intl. Workshop on StatisticalMetrology, Kyoto, Japan, Jun. 1999. [49] D. Kim, S. Kim, Y Lee, S. Kim, and K. Suh, "Study of Micro-Scratch on Oxide Film in VLSI Circuit," Proc. of 1999 VLSI-MIC, pp.283-287, 1999. [50] E. Chang, B. Stine, T. Maung, R. Divecha, D. Boning, J. Chung, K. Chang, G. Ray, D. Bradbury, S. Oh, D. Bartelink, "Using a Statistical Metrology Framework to Identify Random and Systematic Sources of Intra-Die ILD Thickness Variation for CMP Processes," IEDM Tech, Digest, pp. 499-502, 1995. 160 [51] D. Ouma, B. Stine, R. Divecha, D. Boning, J. Chung, I. Ali, and M. Islamraja, "Using Variation Decomposition Analysis to Determine the Effects of Process on Wafer and Die-Level Uniformity in CMP," First International Symposium on Chemical Mechanical Planarization (CMP) in IC Device Manufacturing, 190th Electrochemical Society Meeting, San Antonio, TX, 1996. [52] B. Stine, V. Mehrotra, D. Boning, J. Chung, D. Ciplickas, "A Simulation Methodology for Assessing the Impact of Spatial/Pattern Dependent Interconnect Parameter Variation on Circuit Performance," IEDM Tech, Digest, pp. 133-136, 1997. [53] B. Stine, D. Ouma, R. Divecha, D. Boning, J. Chung, D. Hetherington, I. Ali, G. Shinn, J. Clark 0. S. Nakagawa, S.-Y Oh, "A Closed-Form Analytic Model for ILD Thickness Variation in CMP Processes," Proc. CMP-MIC Conf, Santa Clara, CA, 1997. [54] G. Nanz and L. Camilletti, "Modeling of Chemical-Mechanical Polishing: A Review," IEEE Trans. on Semi. Manuf., Vol 8, No. 4, pp. 382-389, 1995. [55] B. Stine, D. Ouma, R. Divecha, D. Boning, J. Chung, "Rapid Characterization and Modeling of Pattern Dependent Variation in Chemical Mechanical Polishing," IEEE Trans. Semi. Manuf., Feb. 1998. [56] D. Ouma, "Modeling of chemical-mechanical polishing for dielectric planarization," MIT Ph.D. Thesis, 1998. [57] V. Mehrotra, S. Nassif, D. Boning, and J. Chung, "Modeling the Effects of Manufacturing High-Speed Microprocessor Interconnect Performance," 1998 InternationalElectron Devices Meeting, San Francisco. CA, Dec. 1998. [58] H. Scheff6, The Analysis of Variance, John Wiley and Sons, New York, NY, 1959. [59] B. Efron, The Jackknife, the Bootstrap, and Other Resampling Plans, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1982. [60] P. Mozumder and L. Lowenstein, "Method for Semiconductor Process Optimization Using Functional Representations of Spatial Variations and Selectivity," IEEE Trans. on Comp., Hybrids, and Manuf Tech., vol. 15, no. 3, pp. 311, (1992). [61] R. Guo and E. Sachs, "Modeling, Optimization, and Control of Spatial Uniformity in Manufacturing Processes," IEEE Trans. on Semi. Manuf, vol. 6, no. 1, pp. 41-57, (1993). [62] D. Drain, Statistical Methods for Industrial Process Control, Chapman & Hall, New York, NY, 1997. 161 [63] T. Smith, B. Goodlin, D. Boning, and H. Sawin, "A Statistical Analysis of Multiple and Single Response Surface Modeling," IEEE Trans. on Semi. Manuf, 1999. [64] F.W. Preston, "The Theory and Design of Plate Glass Polishing Machines," J Soc. Glass Technol., Vol. 11, pp. 214-256, 1927. [65] P. Renteln, et al., VLSI Multilevel Interconnect Conference, pp. 57-63, Santa Clara, CA, 1990. [66] Y Hayashide, M. Matsuura, M. Hirayama, T. Sasaki, S. Harada, H. Kotani, "A novel optimization method of chemical mechanical polishing (CMP)", Proc. VLSI Mulitlevel InterconnectConf, Santa Clara, CA, pp. 464-470, 1995. 162