Device Independent Process Control of Dielectric

advertisement
Device Independent Process Control of Dielectric
Chemical Mechanical Polishing
by
Taber Hardesty Smith
Bachelor of Science, Rochester Institute of Technology, May 1994
Master of Science, Massachusetts Institute of Technology, June 1996
Submitted to the Department of
Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 27th, 1999
©Massachusetts Institute of Technology, 1999. All Rights Reserved.
A u th o r ....................................................................
Electrical Engineering and Computer Science
September 27th, 1999
Certified by ..............
Boning
Duan(
Associate rofessor
Electrical Engineerin g and Computer Science
Accepted by ..............
MASSACHUSETTS INSTITUTE
OFTECHNOLOGy
FMAF
LIBRARIES
Arthur C. Smith
Chairma n, Department Committee on Graduate Students
Electrical Engineering and Computer Science
I
Device Independent Process Control of Dielectric
Chemical Mechanical Polishing
by
Taber Hardesty Smith
Submitted to the Department of Electrical Engineering
and Computer Science on September 27, 1999, in partial fulfillment
of the requirements for the degree of Doctor of Philosophy
Abstract
The use of the chemical-mechanical polishing (CMP) process in the semiconductor
industry is growing rapidly, and it is a critical step in the manufacturing of integrated circuits. The CMP process is complicated by many factors, and controlling all of these factors in a single controller has been unrealistic. One of the most significant factors
complicating control is the dependency of the polishing process on the pattern layout of
the particular device being polished. The interactions between these patterns and the polishing behavior of a CMP tool make monitoring and controlling the process particularly
difficult. Current techniques focus on the control of a few sites on a single type of layout
being polished on a single tool. We show that using only a few sites does not give insight
into what is happening at sites that are not measured and controlled. Further, by restricting
attention to a single device, these controllers address only pieces of a larger problem and
fail to take into account the effects of different devices being polished on the same tool.
This thesis outlines a comprehensive framework for controlling the polishing of multiple devices with arbitrary layout patterns being polished on a single CMP tool. We explore
the use of an advanced CMP process model in conjunction with an on-line metrology system and a simple filtering algorithm for controlling the average post-polish thickness and
monitoring the global non-uniformity of multiple devices being processed. This framework provides several benefits. First, it allows measurements from any device to be used
to update the tool level model that can be used with any other device being processed. Second, it allows us to measure only a few points, while very accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of
the polishing process for each device being processed. Experimental work shows that this
approach results in very accurate control of the true average thickness of multiple devices,
with a lot to lot variability of only 100A. In addition, we are able to very accurately predict
the global non-uniformity of the polished wafers. However, the model used was found to
have a minor dependence on the device type, indicating that an improved layout pattern
functional model is necessary to achieve truly device independent control. We explore one
possible model that might reduce these dependencies, and show this model provides a
50% improvement in fitting errors of both raised and down area thicknesses. Despite this,
we find that device dependencies still exist with the improved model, and further work is
necessary to make the model and control strategy completely device independent.
3
Acknowledgments
I would like to thank my wife, Tina, for being the perfect complement of my technical
persona, and for being the source of much joy in my life. I apologize for making her sit
through many conversations bored to tears. She has been a huge supporter of my efforts
and drive to continue. She has sacrificed a lot in her life to make this possible, and much of
the credit goes to her.
I would also like to thank my mother and father, Cheryl Hardesty and Dale Smith, for
making me who I am today, and for always being so supportive of everything that I have
done or ever wanted to do. My brother, Dirk, has been a role model of character and
respect for me since birth. My brother Brett Smith has been a driver of my efforts in everything from soccer to studying all my life. My family thus owns much of the credit.
I don't normally believe in luck, but one phone call five years ago is hard to argue
with. I would like to thank my advisor, Professor Duane Boning, for calling, and subsequently encouraging every idea I have had, keeping me straight and barely under control,
and having faith in me when I was drowning in pressure. He has been an inspiration for
me, who never ceases to amaze me with his ability or character. He has been an outstanding advisor, who I would not have traded for anyone else. Our journey together has been a
long one, yet it seemed to go by so fast. This is what happens when you are having fun,
and working with Duane has been the best.
Most of my experimental work was done in collaboration with Texas Instruments Inc.,
in Dallas, TX, where I spent nine months and a lot of their resources. I was very fortunate
to meet a lot of outstanding people there. I would to thank Dr. Jerry Stefani for many
things, including agreeing to co-advise my thesis. Jerry has been a great friend, mentor,
and supporter throughout my Ph.D work. He made sure we had fun in Texas, and for that I
have many fond memories. He has also been a great mentor, and has taught me a tremendous amount. He is amazing at pulling people and resources together, organizing work
efforts, and championing what he believes in, so the credit for much of my work goes to
him as well. I would also like to thank Dr. Simon Fang. Simon is one the funniest, hardworking, and intelligent people I have ever worked with. Much of the work in this thesis
and other works were the result of one idea caught from the fountain of knowledge
eschewing from Simon. I would also like to thank Simon for helping to manage and direct
much of my work at TI in the Spring of 1998, and for staying up all night in the lab with
me (I still can't figure that one out). I would also like to thank Dr. Stephanie Butler, who
has supported my work from the beginning. She has taught me a lot about semiconductor
manufacturing, process control, organizations, and how to look for relevant and important
problems. I would like to thank Dr. Greg Shinn, for being very supportive of my efforts
and my work in the CMP area at TI. I am also thankful to Leif Olsen for helping out with
my work, and allowing me to constantly interrupt his work to run my experiments. I would
also like to thank many other people at TI, including Chris Baum, Mark Betts, Dr. Scott
Bushman, Alicia Clark, John Clark, Charles Crain, Dr. Michael Daniels, Dr. Santos Garza,
Susie Gauna, Dr. Jarvis Jacobs, Dorothy McAllister, Rita McKern, Justin Scout, Dr. Robert Soper, and Dr. Michael West, who made my stay there a great experience.
Here at MIT, my colleagues in the Process Control and Statistical Metrology Groups
have made my life at MIT a blast. Aaron Gower is the most selfless and honorable person
I have ever met. Thanks go to him for helping to build, set up, or debug every mathemati4
cal, computer, or programming problem I had. Also for all the good times, and for putting
up with me for five years. I would like to thank Sandeep Sadashivapa for lots of good
times outside the office, and wish him the best of luck in his new life outside of MIT. I
would also like to thank Brian Goodlin for a lot of great discussions, and lots of good work
together. Brian taught me a lot about the science of learning, in everything from modeling
to guitar. I would like to thank Dave White for all the great discussions which educated me
on topics related to work, but even more not related to work. From the old days, I would
like to thank Minh Le, who is also a fountain of amazing ideas, for all the heated discussions on work and life, and for keeping in touch. The same for Dr. Ka Shun Wong, and for
being an example of complete diligence that I pale in comparison to. I would like to thank
Eric Stuckey for our work on process control together, and more so for the fun outside
work. Thanks go to Han Chen, for helping out with many theoretical problems, and for
putting up with our craziness in the office. I would to thank Rajesh Divecha and Brian
Stine for teaching me much of what I learned about Statistical Metrology. I would also like
to thank Dr. Dennis Okumu Ouma, who deepened my understanding of these and many
other areas. Tae Park, Tamba Tugbawa, Brian Lee, Charles Oji, Vikas Merhotra, Terence
Gan, and Shiou Lin Sam are to thank for broadening my experience to other areas. I would
also like to thank Angie Nishimoto for her excellent attention to detail and for the great
fun we had thermal energy. I wish all of these fine people the best of luck throughout their
lives.
I'd like to thank Professors Akintunde Akinwande and Tommi Jaakkola for agreeing to
serve on my area exam committee with a very short notice. I would also like to thank Prof.
Jung-Hoon Chun and Prof. John Tsitsiklis for reading my thesis and serving on my committee.
This work was sponsored by in part by the NSF/SRC Engineering Research Center for
Environmentally Benign Semiconductor Manufacturing. We would like to thank Paz
Amit, Avron Ger, and Nova Measuring Instruments Ltd. for their assistance in setting up
the NovaScan on-line metrology tool and performing some of the experiments. We would
also like to thank Joost Grillaert, Dr. Marc Meuris, and IMEC for valuable discussions
regarding the IMEC model.
5
Table of Contents
17
Chapter 1. Introduction ............................................................................................
19
1.1 An Introduction to Run by Run Process Control .........................
24
1.2.1 Blanket Wafer Performance Metrics . . . . . . . . . . . . . . 29
31
1.2.2 Patterned Wafer Performance Metrics . . . . . . . . . . ..
1.2 An Introduction to CMP .......................................
34
1.3 An Introduction to CMP Process Control Issues .......................
1.4 Sum m ary ......................
..
................
..
37
Chapter 2. Control of a Single Device Using On-Line Metrology .........................
41
........
42
2.1 Evaluation of On-Line Metrology for CMP ...........
2.1.1 Measurement Repeatability
......
. . . . . . . . . . . . . . . . . . 45
2.1.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.1.3 Correlation with Ex-Situ Metrology . . . . . . . . . . . . . . . 49
53
2.2 Throughput and Cost of Ownership Improvements ......................
. . . . . . . . . . . . . . . . . .54
2.2.1 Throughput . . . . ...
. . . . . . 56
. . . . . . . . . .
Reductions
2.2.2 Cost of Ownership
59
2.3 Run by Run Control of CMP with On-Line Metrology ....................
2.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . .. 60
2.3.2 The Run by Run Process Control Algorithm . . . . . . . . . . . . 61
2.3.3 Patterned W afer Control . . . . . . . . . . . . . . . . . . . . 62
66
2.4 Sum m ary .......................................................
Chapter 3. Control of Multiple Devices in Dielectric CMP ...................................
69
3.1 Pattern Dependencies in Dielectric CMP ..............................
71
3.2 Modeling of Dielectric CMP .......................................
74
3.3 Problems With Existing Control Methods in Dielectric CMP ..............
83
3.4 Current Methods for Controlling Multiple Devices in Dielectric CMP ....... 87
3.5 The Multiple Device Control Problem for Dielectric CMP ................
92
3.6 A Framework for the Control of Multiple Devices in Dielectric CMP ....... 101
3.6.1 A Device Independent Control Algorithm . . . . . . . . . . . . 103
3.6.2 Further Discussion of the Device Independent Control Algorithm
. . 108
3.7 Experimental Results ............................................
3.7.1 Updating Both Planarization Length and Blanket Removal Rate . . .
3.7.2 Updating Blanket Removal Rate Only . . . . . . . . . . . . .
3.7.3 Correcting for Device Dependencies in the Blanket Rate . . . . . .
7
110
111
116
119
3.8 Sum mary ......................................................
Chapter 4. A Dielectric CMP Model Combining Density and Step Height
Dependencies ..............................................................................................
122
125
4.1 Density and Step Height Dependent Models ...........................
127
4.2 Analysis of the MIT Density Model .................................
131
4.3 A Combined Density and Step Height Model ..........................
134
4.4 Variations of the Time-Density Model ...............................
137
4.5 Polish Time and Device Dependencies ...............................
143
4.6 Sum mary ......................................................
146
Chapter 5. Conclusions and Future W ork ................................................................
149
References ......................................................................................................................
157
8
List of Figures
Chapter 1.
Figure 1.1. An uncontrolled drifting process.
20
..............................
Figure 1.2. SPC control of a drifting process using tuning with WECO rules. ....... 22
24
Figure 1.3. EWMA control of a drifting process ..............................
Figure 1.4. Schematic application of the CMP process in interconnect formation. . . . . 26
Figure 1.5. Chemical-mechanical polishing tool configuration. ..................
Figure 1.6. Diagram of pad, slurry, and surface interactions.
.
27
28
..............
29
Figure 1.7. Blanket wafer measurement sampling patterns. . .....................
Figure 1.8. Blanket wafer removal rate profile. The surface is an interpolation based on the
measured points, which are indicated by the stars.
.....
..............
. .. ....
30
Figure 1.9. Blanket wafer post-polish thickness profile.. ........... .............
30
Figure 1.10. A typical die sampling plan. . . . . . ..............................
31
Figure 1.11. Typical structures used for step height measurement..........
.. .. ....
32
Figure 1.12. A typical step height measurement before planarity is reached . ........ 32
Figure 1.13. A typical step height measurement near planarity. ..................
33
Figure 1.14. The within-die variation of a typical production wafer................
34
Figure 1.15. Average removal rate of blanket sheet film PETEOS wafers...........
35
Figure 1.16. Average removal rate of patterned PETEOS wafers..................
35
Figure 1.17. Within-wafer non-uniformity of blanket sheet film PETEOS wafers. .
36
Chapter 2.
Figure 2.1. Polishing sequence with on-line metrology. ........................
43
Figure 2.2. The on-line pattern recognition system.............................
44
Figure 2.3. Correlation plot of the on-line and ex-situ post-polish patterned wafer
thicknesses . ........................................................
51
Figure 2.4. Polishing with look-ahead wafers, rework, and ex-situ metrology. Total time
per lot with 2 wafer rework is 255 minutes; a throughput of 0.2353 lots per hour. Total
55
time per lot with 24 wafer rework is also 255 minutes........................
9
Figure 2.5. Polishing with look-ahead wafers, rework, and in-line metrology. Total time
per lot with 2 wafer rework is 142 minutes; a throughput of 0.4225 lots per hour. Total
time per lot with 24 wafer rework is 177 minutes; a throughput of 0.3390 lots per hour.
These are improvements of 80% and 44%, respectively . .....................
55
Figure 2.6. Percent reduction in the cost of ownership vs. percent rework for a process with
look-ahead wafers ...................................................
57
Figure 2.7. Controlled average post-polish patterned wafer thickness over the 600 wafer
experim ent . ........................................................
62
Figure 2.8. Average removal rate of patterned PETEOS wafers...................
63
Figure 2.9. Within-wafer non-uniformity of blanket sheet film PETEOS wafers...... 64
Figure 2.10. Average post-polish thickness of patterned wafers using pilot wafers and sheet
film equivalents to control the post-polish thickness of the patterned wafers . ..... 65
Figure 2.11. Controlled average post-polish patterned wafer thickness over the 600 wafer
experiment using five measurement sites .................................
65
Chapter 3.
Figure 3.1. A very densely sampled thickness profile of a typical wafer, including waferlevel and die-level variation components .................................
71
Figure 3.2. Die-level thickness profile of a test device..........................
72
Figure 3.3. Sources of thickness variation in the CMP process . ..................
72
Figure 3.4. Total, within-die, and within-wafer thickness variation of a typical test device
as a function of polishing time . ........................................
73
Figure 3.5. Cross-sectional view of the oxide thickness in a patterned wafer .........
76
Figure 3.6. The MIT density model predictions of the removal rate of the up and down
areas as a function of time for one particular density.........................
78
Figure 3.7. The MIT density model predictions of the up area removal rates and
thicknesses, as a function of time, for different densities......................
78
Figure 3.8. A cross section of the elliptical weighting function used in the density model
to calculate the effective density of the features.............................
80
Figure 3.9. A high-level view of the MIT density model . .......................
80
Figure 3.10. Measurement plan of a test layout pattern (Device #2)................
81
Figure 3.11. Measured and modeled values for the post-polish thickness of the raised and
down areas using the MIT density model .................................
82
Figure 3.12. Measured and modeled values (dashed lines) for the post-polish thickness of
the raised areas for several polish times using the MIT density model . .......... 83
10
Figure 3.13. Blanket wafer removal rate profile and patterned wafer removal rate profiles
84
over the surface of a wafer predicted by the density model....................
Figure 3.14. Example current practice for CMP process control using sheet film
equivalents (SFEs) . .................................................
87
Figure 3.15. Post-polish thickness profiles for two different devices. Measurements were
89
taken over a grid similar to that in Figure 3.10 ............................
Figure 3.16. Multiple device control using a three site average of the thickness . ..... 89
Figure 3.17. The average thickness for two different devices predicted by the MIT density
91
m odel. ....................................................
Figure 3.18. Within die variation (standard deviation of the post-polish thickness) shown
for a design of experiments that varied the table speed and down force over a wide range
.
for the dielectric CMP process . ....................................
93
Figure 3.19. Within-die variation shown over the polishing of 600 wafers. The stars are the
within-die variation measured on four dies on each wafer, and the solid line is the
average of the four die from eight wafers over the 600 wafer run. ..............
94
94
Figure 3.20. Typical structures used for step height measurement . ................
Figure 3.21. Step height measurement for a low density feature, for three different
processes, plotted against the amount removed on a blanket wafer. .............
95
Figure 3.22. Higher density structures used for step height measurements. ......... 96
Figure 3.23. Step height measurement for a low density feature, for three different
processes, plotted against the amount removed on a blanket wafer..............
96
Figure 3.24. The difference in the step height measured at low and high density features,
for three different processes, versus the blanket amount removed. .....
. ......
98
Figure 3.25. The within-die variation, measured at 25 locations in 10 die, for three different
processes, versus the blanket amount removed . ...........................
98
Figure 3.26. A device independent run by run process controller for CMP . ........ 102
Figure 3.27. The average planarization length over the course of 100 six wafer lots (solid
line), and calculated planarization lengths for each of the four die on each of these eight
.................................
wafers (dots) . .............
109
Figure 3.28. Test devices being controlled with the device independent controller. . . 110
Figure 3.29. Measurement plan of Device 1. Circles are points used for control. Crosses
112
and circles are used to determine the true average..........................
Figure 3.30. Measurement plan of Device 2. Circles are points used for control. Crosses
112
and circles are used to determine the true average..........................
Figure 3.31. Map of the dies used for the multiple device control. . ..............
11
113
Figure 3.32. Controlled average thickness of 63 sites on four dies measured following the
experiment and the device number run ..................................
114
Figure 3.33. The minimum, maximum, and range of the polished devices. The dashed lines
represent the predicted values using the model, while the solid lines indicate the values
determined from the 63 point measurements on four dies . ...................
114
Figure 3.34. Parameters extracted from the measured data during the first control
ru n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 15
Figure 3.35. Controlled average thickness of 63 sites on four dies measured following the
experiment and the device number being run..............................
117
Figure 3.36. The minimum, maximum, and range of the polished devices. The dashed lines
represent the predicted values using the model, while the solid lines indicate the values
determined from the 63 point measurements on four dies . ...................
117
Figure 3.37. Measurement plan of Device A. Circles are points used for control and crosses
and circles are used to determine the true average..........................
118
Figure 3.38. Controlled average thickness of 63 sites on four dies measured following the
experiment and the device number being run..............................
120
Figure 3.39. The minimum, maximum, and range of the polished devices. The dashed lines
represent the predicted values using the model, while the solid lines indicate the values
determined from the 63 point measurements on four dies . ...................
120
Figure 3.40. Difference in the average of the measured sites and the average of the
controlled sites (the "true" average) ....................................
121
Chapter 4.
Figure 4.1. A high-level view of the MIT density model. ......................
128
Figure 4.2. The removal rates of the raised and down areas using the IMEC step height
dependent model ...................................................
129
Figure 4.3. Removal rates of the density and step height dependent models for both the
MIT density model and the IMEC model ................................
130
Figure 4.4. Percent difference in removal predictions between the density model and the
IME C model. ......................................................
131
Figure 4.5. Description of the pattern features in the test mask used for model
comparisons . ......................................................
132
Figure 4.6. Measured and modeled values for the post-polish thickness of the raised areas
using the MIT density model (dashed line is the model fit)...................
133
Figure 4.7. Measured and modeled values for the post-polish thickness of the down areas
using the MIT density model (dashed line is the model fit)...................
133
Figure 4.8. Measured and modeled values for the post-polish thickness of the raised areas
using the time-density model (dashed line is the model fit)...................
136
12
Figure 4.9. Measured and modeled values for the post-polish thickness of the down areas
using the time-density model (dashed line is the model fit) ....
............
136
Figure 4.10. Model fit of the step height at contact time as a function of the effective feature
. 138
density .........................................................
Figure 4.11. Model fit of the contact time as a function of the effective feature
density ............................................................
138
Figure 4.12. Measured and modeled values for the post-polish thickness of the raised areas
using time-density model with contact height as a function of density. ......... 140
Figure 4.13. Measured and modeled values for the post-polish thickness of the down areas
using time-density model with contact height as a function of density. . . . . .....
140
Figure 4.14. Model fit for contact step height as a function of the effective feature
density..........................
...........
......... . . . .......
.
141
Figure 4.15. Model errors for raised and down areas as a function of the effective feature
density................................
...................
. . . . 141
Figure 4.16. Measured and modeled values for the post-polish thickness of the raised areas
using time-density model with contact height as a function of line space. ....... 142
Figure 4.17. Measured and modeled values for the post-polish thickness of the down areas
using time-density model with contact height as a function of line space. . ......
142
Figure 4.18. Model fit for step height at contact time as a function of line space. . . . . 143
Figure 4.19. Measured and modeled values for the post-polish thickness of the raised areas
using time-density model with contact height as a function of density for various polish
times. Dashed lines are model fits and stars are experimental data points ........ 144
Figure 4.20. The planarization length as a function of the device number being run (using
the experimental data from the third control run in Chapter 3). ............... 145
Figure 4.21. The blanket rate as a function of the device number being run (using the
146
experimental data from the third control run in Chapter 3) ...................
Chapter 5.
13
14
List of Tables
Chapter 1.
Chapter 2.
Table 2.1. Average differences between the on-line and the ex-situ measurements.
...
52
Table 2.2. Variation added to on-line measurements by polishing, residual slurry,
and wafer loading . .........................................
. . ..... 53
Table 2.3. Breakdown of the variation in the on-line metrology system.............
Chapter 3.
Chapter 4.
Chapter 5.
15
53
16
Chapter 1
Introduction
The use of the chemical-mechanical polishing (CMP) process in the semiconductor
industry is growing rapidly [1]. Its use in the polishing of inter-level dielectrics has provided the ability to significantly increase the number of levels of interconnect in integrated
circuits (ICs). This has provided improvements not only in circuit performance, but also in
product yield. In addition, it is a critical step in the manufacturing of newer generation ICs
which utilize Copper (Cu) and shallow trench isolation (STI) processes [2,3].
While the CMP process provides many benefits to the manufacturing of ICs, it also has
many problems. This is particularly true in a production setting, where the gradual wear in
the polishing pads and the simultaneous processing of several types of ICs on a single polishing tool create changes in the tool operation that are difficult to monitor and control.
Because our current understanding of the process still lags behind its application in the
industry, statistical process control techniques have been the only methods able to achieve
and maintain quality processing. There have been several works on controlling the CMP
process [4-15]. However, these initial works have addressed only pieces of a larger problem. The CMP process is complicated by many factors, and controlling all of these factors
in a single controller has been unrealistic. One of the most significant factors complicating
matters is the manner in which the particular pattern of metal and other components are
laid out within each IC to make the circuit. The interactions between these patterns and the
polishing behavior of a CMP tool make monitoring and controlling the process particu-
larly difficult. As a result, initial work on the process control of CMP focused on simple
methods aimed at monitoring and controlling the polishing of unpatterned or "blanket"
wafers [4-12]. However, as we show in Chapter 2, this is very different than controlling
directly on patterned wafers. Realizing this is critical for implementing any control
scheme in a production wafer fabrication facility (fab), later works began to focus on the
control of patterned wafers [13-15]. Direct control of patterned wafers using a multivariate
non-linear controller which monitors the average removal rate and wafer-level uniformity
was shown in [14], and patterned wafer control using a self-adjusting control algorithm to
control the average post-polish thickness was shown in [15]. These techniques focus on
the control of a few sites on a single type of layout being polished on a single tool. We
show in Chapter 3 that this gives little insight into what is happening to the locations that
are not measured and controlled, and does not ensure that the entire device (or product
type) is polished correctly. In addition, by restricting attention to a single device, these
controllers fail to take into account the effects of different patterns being polished on different tools. For example, a device of one type may wear the CMP pad more than a device
of another type, and this increased wear on the pad reduces the polishing rate of the other
devices. These effects need to be combined into a comprehensive control strategy in order
to properly control the CMP process.
The purpose of this thesis is to outline such a comprehensive framework for controlling the polishing of wafers with multiple arbitrary pattern layouts on a single polishing
tool. The approach allows for the control of multiple devices with completely different
layout patterns simultaneously being polished on a single CMP tool. This is achieved by
combining a CMP model that predicts the post-polish thickness of an arbitrary device lay-
18
out with a feedback control algorithm. We begin in this chapter by reviewing the basics of
run by run process control, the CMP process, and some well-accepted issues involved with
controlling the CMP process.
1.1 An Introduction to Run by Run Process Control
As semiconductor processing entered the late 1980s, control charting and statistical
process control (SPC) had substantially decreased process variability and increased process capability. In SPC techniques, the process output (e.g. deposition thickness) is monitored for different types of deviations from the process target. Traditionally, once an alarm
(statistically significant deviation from the process target) is signaled, the process is shut
down to perform maintenance and to re-optimize the process recipe. One set of rules for
such deviations are known as the Western Electric Company (WECO) rules. A subset of
these rules are:
1. Last point of data is greater than three standard deviations away from the
process target.
2. Two of last three data points are greater than two standard deviations away from
the target in the same direction.
3. Four of last five data points are greater than one standard deviation away from
the target and in the same direction.
4. Last eight data points are all above or all below the target.
Industry response to open-loop statistical process control has been overwhelming and the
use of this type of control has become heavily ingrained the semiconductor industry.
With the decrease in variability, however, new problems were beginning to arise. Many
processes were showing signs of a steady drifting off target [16-20]. Such drifts were often
19
caused by the build-up of material on the interior components of the tools. For example,
the deposition rate in a metal sputtering process is highly correlated to the life of the components within the tool, particularly due to the build-up of the deposited material in the
honeycomb-like collimator used to improve coverage on the surface of the wafer [20]. The
resulting drift in the deposition thickness is shown in Figure 1.1. Classical SPC
approaches assume the process is "in control," and not subject to such drift. Nevertheless,
these methods were often used to monitor and compensate for such problems [21-26].
However, the reduction of variation in semiconductor processing, combined with the
increase in process drift, resulted in a large number of alarms. Operators and engineers
began to make frequent "updates" to the process time in order to quickly bring the tool
back on-line. For example, the process output was often shifted back to the target (in the
sputter deposition case, this is done by adjusting the deposition time) by an amount typically equal to the sample mean of the error over the violation set (i.e. the last five data
points for WECO rule #3). This led to automated SPC control, whereby a simple process
6400 -3
6200
600--
2
- - - -
Ta--a-
000
52005600-
-frge
560*(3n
5800 -
-Y
- - - - - - - - - - - - - -- - - - - - - - - - - -
4.
.
5400
-. -
0
00
5200
0
*
0
4) 5000-
5
0
5
10
520253
4800460015
Run #
Figure 1.1: An uncontrolled drifting process.
20
20
25
30
model was used to automatically adjust the process inputs upon the violation of an SPC
rule. A typical control method assumes a process model is of the form:
y,n[n] = x[n] - t,[n](.1
where yn[n] is the process output (e.g. deposition thickness), x[n] is the process rate (e.g.
deposition rate), and t,[n] is the process time on run n. Typically, the estimate of the process rate is held constant, and the process output is monitored using SPC rules such as the
WECO rules above. When a violation of the rule set occurs, the estimate of the process
rate is updated to the average over the violation set, i.e.
V e {0}
x[n+1] = x[n]
x[n+1] = y,[n]/t,[n]
Ve {1}
(1.2)
(1.3)
n
x[n+1] =
yi]/t,
v e {2}
(1.4)
y,[i]/tP[i]
ve {3}
(1.5)
y,[i]/t,[i]
v E 14},
(1.6)
i=n-2
n
x[n+ 1] =
1
i= n-4
n
x[n + 1] =
1
i = n-7
where v is the rule violated (e.g. zero indicates no violation, one indicates WECO rule #1
was violated, etc.) and y,[i] is the actual process output. As can be seen in Figure 1.2, the
performance of this method in controlling a drifting process can be quite poor, in the sense
that the controlled process is often outside the two standard deviation limits. In this case,
the statistical limits were calculated using the root mean squared error (RMSE) of a linear
least squares fit of historical data of the uncontrolled thickness.
21
6400-
3a
6300
2a
6200
C
6100 -
---CY
0 - ----------------
-----
---
Target
- -------------- ------------------
.C 6000
.2
0-
:.
600
59
-
-
-
-
-
5900---------
a
-
--.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
------------
-
-*
-
-.
-
-
-
-
-
-
--------------
--------------------------
-
--
-a
s
.-
5700---------------------------------------------------
56001
5
10
15
20
25
30
Run #
Figure 1.2: SPC control of a drifting process using tuning with WECO rules.
On the other hand, tools like thermal deposition furnaces experience regular shifting in
their outputs, and, for these tools, the SPC approach works quite well. Other processes
randomly drift away from the target output, but then drift back the other way, continually
wandering about the target. Current CMP tools are a good example of this [11]. These processes, like the steadily drifting processes, suffer from poor performance of automated
SPC techniques.
These problems have caused a shift in control towards continual tuning methods [420,27-42]. One example of this type of control is the exponentially weighted moving average (EWMA) controller. This control method is a combination of the EWMA SPC statistic
[21-24,26] and closed-loop feedback control. An exponentially weighted moving average
of the process output at discrete-time n has the form
x[n + 1] = w . y,[n]- (1 - w) - x[n] ,
where
x[n]
(17)
is the EWMA statistic on run n and w is the EWMIA weight, which is gener-
22
ally restricted to 0< w < 1. Higher values of w result in recent measurements more
strongly affecting the weighted average. This statistic is used to monitor a process output,
or a state of the process, and make small incremental changes to the process recipe in
order to keep the process on target. The closed-loop feedback control method using an
EMWA was developed in [27-28,4,5,16,29]. Examples of the use and study of the EWMA
controller and related variations are given in [4-5,9-12,14,16-17,27-34]. The simplest version of the controller uses a model identical to that given in (1.1), and replaces the tuning
rules (1.2) through (1.6) with (1.7). In other words, the process model is continually tuned.
For the single-variate case, the process input (process time) is calculated as
t,[n + 1]
= yd[n]/x[n +
1]
(1.8)
where Yd[n] is the desired output value.
The EWMA controller provides good control for processes which have small variations over time. Stability for the single-input single-output (SISO) and multiple-input multiple-output cases are well understood and the controller has been shown to be stable over
a large range of model mismatch [16,12]. In addition, the EWMA controller is designed
around a statistically based filter which can be tuned to a given process (i.e. filter noise in
the best possible way while minimizing errors due to real changes in the process), and
methods exist for determining the optimal EWMA weight [12,30,34]. These works have
demonstrated that the performance of the EWMA controller in response to process shifts
and drifts while minimizing the response due to noise is quite good. This is especially true
for systems which have slow dynamics buried in large amounts of noise. For example, the
results of an EWMA controller used to control the deposition thickness of the sputter deposition process above are shown in Figure 1.3. We see the continual adjustment to the pro-
23
6400-
3cy
630- - - - -"- "-- - -"- -'- -"- - - - '-'-- -"-"'-- - - - - - - - - - - - -
6300
2cy
6200
00-
6200 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
G
*
" 61500 - - - - - - - - - - - - - - -.- - - - !- - -- - - - - -
-- - - - - - -
I-I
0
0
S5900------------
5700
--
r-- - - .. -.- .. ..- -
-- -- -- -- -- -- -- -
10
0
5600 L
-------
.. .. ,. 4.. .. .. ..
IIJ
5
.. .. ,.- ..
1
10
0
15
------
.. .. . ..-. . . . .....
25*
20
25
.
.
-a
-3(y
30
Run #
Figure 1.3: EWMA control of a drifting process.
cess results in tighter control of the process (e.g. there are fewer points outside the two
sigma limits).
1.2 An Introduction to CMP
This section outlines some fundamentals of the CMP process before we move on to
discussing a few of the basic issues involved with controlling the CMP process in the next
section. We begin by discussing some basics of the "back-end" manufacturing of integrated circuits (ICs), where CMP is most widely utilized. By back-end, we mean the process technology and steps above the layers of metal and dielectric materials that are used
in the formation of the electrical interconnections between the active components of a circuit (e.g. transistors, as formed in the "front-end" processing). As shown in Figure 1.4, the
interconnect is manufactured by depositing thin films of materials, and selectively remov-
ing or changing the properties of these materials in certain areas. A new "level" of thin
24
film is deposited on top of old films and the process is repeated many times until the interconnect is complete. The goal of the CMP process is to planarize step heights caused by
the deposition of thin films over existing non-planar features, so that further levels may be
added onto a flat surface. This process is outlined in Figure 1.4.
After transistors are formed on the silicon substrate, a pre-metal deposition (PMD) of
dielectric material is laid down, and contacts to the underlying devices are made in the
PMD layer. This via formation often includes a metal CMP process, which is not specifically discussed in this thesis. Following this, the Metal 1 deposition occurs, followed by
patterning and removal of the Metal 1 layer to create the Metal 1 lines. This step is followed by the deposition of an interlevel dielectric (ILD) layer. The patterned metal lines
leave a non-planar surface on the ILD 1 layer. The CMP process is used to planarize this
surface, so that interconnecting metal holes may be etched and filled with material (Via 1),
and the Metal 2 connecting lines may be deposited and patterned. This proceeds to higher
level metals until the circuit is complete. Current generations of ICs have up to seven layers of metal circuitry [43].
The dielectric CMP step in this sequence is critical to determining the performance of
the device, as well as the defectivity rate of the circuit. Without the planarity achieved by
CMP, several problems could occur. First, changes in the vertical height of the surface profile make optical patterning using photolithography difficult. The extremely small feature
sizes require very tight focusing of the light. Any changes in the vertical height in different
regions within the device will cause changes in the focusing and, hence, the sizes of the
features. This lithographic depth of focus limitation results in extremely tight requirements on the planarity of the wafer surface. If these requirements are not met, it could
25
up areas
down areas
stepA
height
ILD 1
Metal 1
a) Cross-sectional diagram of a pre-CMP
interlevel dielectric on top of metal lines.
ILD 1
Metal
1
b) Cross-sectional diagram following CMP.
ILD 2
mtal 2
eVia
1
Eetal ID
c) Cross-sectional diagram following CMP,
and application of Metal 2 and ILD 2.
Figure 1.4: Schematic application of the CMP process in interconnect formation.
result in large variations in the sizes of the metal lines and interconnects, and lead to a degradation in device reliability and performance. Non-planar surfaces can also cause faults in
the circuit as a result of metal depositions from different levels of metal making unin-
26
tended contact. This happens when the metal interconnect holes are etched over the edge
of a step in a non-planar surface. The hole is then etched through the ILD to the underlying
layer, and filled with metal, causing an unintended short between levels. Therefore, the
planarization of ILD layers is critical to the performance and yield of current and future
generation devices. In particular, as device sizes continue to decrease and the number of
metal layers continues to increase, the importance of surface planarity and CMP will also
continue to grow. Future interconnect technology is moving to copper and damascene processes [2-3]. While the details of the process differ, similar demands are placed on good
control of planarity in metal CMP. In Addition, the use of CMP to form shallow trench isolation structures also requires extremely good control planarity in dielectric polish.
In traditional CMP processing, as illustrated in Figure 1.5, the wafer is held in a wafer
carrier and pressed face-down onto a polishing pad, which is affixed to a rotating platen. A
slurry with abrasive material (e.g. silica particles of size 50-100nm) held in suspension is
dripped onto the pad during polishing. The carrier and platen rotate at variable speeds, typically on the order of 30 rpm. CMP tools differ in many aspects, including the number of
platens, the number of polishing carriers per head, the size of the pad in relation to the
wafer size, and rotary versus linear polishing mechanisms.
Slurry Feed
Wafer
Carrier
Platen
Platen
Slurry Feed
Top View
Side View
Figure 1.5: Chemical-mechanical polishing tool configuration.
27
As shown in Figure 1.6, the CMP process preferentially removes material from raised
areas on the surface of the wafer through a combination of mechanical and chemical
action. In dielectric CMP, the chemical action is thought to serve two purposes: a) to
soften, or hydrolyze, the surface of the dielectric material so that the soft pad and slurry
particles can abrade away the surface, and b) to keep particles from agglomerating in the
slurry. Recently, alternative slurries (e.g. CeO 2 particles) and slurry-release mechanisms
(e.g. fixed-abrasive pads) have been reported to exhibit very different behavior [44-45]. In
this work, we focus on conventional oxide polish systems. The preferential removal rate of
the raised areas is thought to be due to the distribution of pressure of the CMP pad on the
raised areas features [46-47], and in particular is related to the density and relative height
of the features, as shown in Figure 1.6 [47]. Dense features tend to support less force per
unit area, because the force of the pad rests on a greater amount area. In contrast, regions
with a significant amount of down (or open) area, have the same force supported by less up
area.
*flMA
7~FkJ7~J\
Metal 1
ILD 1
LJ77Surry
CMP Pad
a) The CMP pad in contact with features.
Metal 1
ILD 1
S lu rry
msamaa
......
...
CMP Pad
.
b) The CMP pad supported by far away (on the order of mm) features.
Figure 1.6: Diagram of pad, slurry, and surface interactions.
28
1.2.1 Blanket Wafer Performance Metrics
The performance of the CMP process is gauged by several different metrics. Although
the process is aimed at reducing the step height on wafers with "patterned" features, several metrics of the polishing of "unpattemed" blanket sheet film wafers are also typically
used. In particular, the removal rate (RR) of material on blanket sheet film wafers is often
used to judge how quickly a process will remove step heights on patterned wafers. Processes with higher removal rates are generally considered better. The RR is determined by
measurement of the oxide film thickness before and after polishing at each of several sites
on the wafer (see Figure 1.7 for two examples). A typical removal rate profile for a blanket
wafer is shown in Figure 1.8, and the resulting post-polish thickness profile is shown in
Figure 1.9. The "removal rate" metric most often used is the average of the amount
removed at each site, divided by the fixed polish time. Differences between polish rates at
the center and edge of the wafer (e.g. "bulls-eye patterns" as seen in Figure 1.8) may arise
due to wafer asymmetry (e.g. wafer flat), non-constant relative pad velocity from the edge
to the center, non-uniform slurry and by-product transport under the wafer, wafer bowing
due to pressure or tool design, or machine drift (with tool or pad age) of any of these
parameters. As a result, the uniformity of the polishing process across the surface of the
+ +
Q
+
+
+
+
+
++
+
+
+ +2
++
+
+
+
+
-6 mm
exclusion
edge
ede+
+
+
49 point sampling pattern
+
25 point sampling pattern
Figure 1.7: Blanket wafer measurement sampling patterns.
29
5000
.24000
3000
S20000
- -.-
1000.
100
50
100
0
-50
-50
y location (mm)
0
50
x location (mm)
Figure 1.8: Blanket wafer removal rate profile. The surface is an interpolation
based on the measured points, which are indicated by the stars.
10000-
8000
-
C 5000 --00
50
100
50
50
0
-50
-50
y location (mm)
x location (mm)
Figure 1.9: Blanket wafer post-polish thickness profile.
wafer is also of concern. In order for all devices on the wafer to be polished to the same
amount, the within-wafer non-uniformity (WIWNU) of a polished unpatterned blanket
wafer is desired to be small (typically 5% or less). The calculation of the WIWNU metric
varies in the industry [48]. One common calculation used is the standard deviation of the
30
amount removed (AR) over the sites on the wafer, divided by the average AR over the several sites, times 100. Other approaches include the standard deviation of the removal rate
or post-polish thickness profiles. These two blanket wafer metrics are generally used to
develop CMP processes, as well as to monitor the CMP process on a lot to lot basis. In
addition, particle and scratching tests are also performed on unpatterned wafers. Particles
and wafer scratching caused by CMP can create severe failures in manufactured circuits
[49], and thus must be carefully monitored.
1.2.2 Patterned Wafer Performance Metrics
In order to verify the planarization of step heights within the wafer, the step heights are
measured at several locations within several die (see Figure 1.10 for a typical sampling
pattern). The sites chosen are usually large features which are easy to find and align on the
measurement tool. Typically, measurement of step heights occurs on several bond pad
structures such as that shown in Figure 1.11. The material stack in the raised area is
14000A of Oxide (PETEOS) on top of 230A of Silicon-Ox-Nitride on top of 6000A of
Aluminum on top of the Silicon substrate. The down areas have had the aluminum
removed. A sample step height measurement using a KLA/Tencor P-20 profilometer taken
Figure 1.10: A typical die sampling plan.
31
Figure 1.11: Typical structures used for step height measurement.
across the structures shown in Figure 1.11 is shown in Figure 1.12, for a 19 second polish
time, i.e. before planarity is reached. Figure 1.13 shows a step height measurement for a
53 second polish time, i.e. near planarity. An important value determined from the measurement is the total indicated range (TIR), i.e. the difference between the maximum and
minimum points of the all the steps within the scan.
Recently, the CMP community has been moving toward the measurement and analysis
ZLI 11 H I
3500-
TIR
63000-
2500 :2000 0
1500 -
1000
E
z
100
200
1500
2000
1000 -
500-
10500
1000
Distance (Microns)
Figure 1.12: A typical step height measurement before planarity is reached.
32
120
100
TIR
80.
'4
0
C
$
60
I0
z
20
0
0
500
1000
Distance (Microns)
1500
2000
Figure 1.13: A typical step height measurement near planarity.
of within-die non-uniformity (WIDNU). The post-polish thickness of 25 sites within a die,
averaged over 10 die on a typical production wafer, is shown in Figure 1.14. Here we see a
large amount of variability in the post-CMP thickness within a die. A general metric for
this is the standard deviation of several sites within a single die. However, a significant
amount of work has gone into understanding this variation [50-57], and much of the work
in this thesis will be based on these ideas. We will thus return to this issue again later.
33
7500,
-.
7000
6500
-
60001.
5500
5000
20
20
15
X (MM)
15
5
10
__10
0
Y ()
0
a) Surface plot of the within-die variation, along with
corresponding measurement points.
8500
8000-
7500-
) 7000C
6500-
6000-
5500 -
0
5
15
10
20
25
Site #
b) Two-dimensional plot of the within-die variation.
Figure 1.14: The within-die variation of a typical production wafer.
1.3 An Introduction to CMP Process Control Issues
Several factors make controlling the CMP process particularly difficult. It is often the
case that there is significant drift in the removal rate over the life of a typical CMP polishing pad [10-12]. Figure 1.15 shows the average blanket removal rate using a data from a
25 point measurement similar to that shown in Figure 1.7 over the course of a 600 wafer
34
polishing experiment. The data shown was taken from the last wafer in the lot. We can see
that the blanket rate has a fairly significant drift; an 11% decrease over the lifetime of a
typical pad. This is also true for the patterned removal rate. Figure 1.16 shows the average
removal rate measured at a single site on an evenly spaced grid of 22 dies of a test device,
similar to a typical production wafer, plotted over the course of the same experiment.
4000
3500F
0
E
0
3000
2500
0
cc
2000
0
I 5U(.
10
20
30
40
50
60
70
80
90
100
Lot #
Figure 1.15:. Average removal rate of blanket sheet film PETEOS wafers.
7500
2
0
7000
E
6500
V
6000
C0
5500
Figure 1.6
10
20
30
40
50
60
70
80
90
100
Lot #
Average removal rate of patterned PETEOS wafers.
35
Again, the data is taken from the last wafer of the lot. Note that there is a corresponding
7% decrease in the average patterned removal rate over the pad lifetime. These decreases
in removal rate have been the focus of most of the initial work in controlling the CMP process. Although the drift is small, if it is not correctly controlled it can lead to significant
differences in the actual average post-polish thickness and the desired average post-polish
thickness.
In controlling the CMP process, the within-wafer non-uniformity on blanket test
wafers is also frequently monitored. However, the WIWNU varies very little over the life a
typical CMP pad, assuming well-designed pad conditioning practices. As shown in Figure
1.17, the WIWNU is fairly consistent over the life of a pad, and the exact value depends on
the particular tool and the component quality (e.g. carrier flatness). On the other hand, the
WIWNU is often the reason the tool is brought down for maintenance. In normal polishing, there is a fall-off in the removal rate near the edge of the wafer. This fall-off is usually
in the 3-6mm edge-exclusion region (shown in Figure 1.7) which is not included in the
04
30
0
10-
z
C
E
E 1015
0
216
0
0o
0_
40
7'0
so
90
100
Lot #
Figure 1.17: Within-wafer non-uniformity of blanket sheet film PETEOS wafers.
36
WIWNU. As the polishing pad ages, this fall-off region moves in closer to the center of
the wafer. The outer measurement points then become significantly different from the
other measurements, and the WIWNU quickly increases. Once the WIWNU exceeds a
certain limit, the tool is brought down, the pad is changed, and maintenance is performed
on the carrier head.
In addition to the removal rate and WIWNU, step height measurements are also taken.
As mentioned above, the step-height is measured on a fixed feature on each particular
device. Because these measurements are time-consuming, only a single step-height measurement on a single feature on a single die of a single wafer in a lot is typically taken. If
the step-height has not been removed to less than a certain level, then the wafers are reworked, meaning they are put back on the polisher for additional polishing. However, care
must be taken to avoid over-polishing. As we saw in Figure 1.14, the within-die non-uniformity can be large, and over-polishing can result in the low areas on the device being
completely removed. As a result, step-height measurements are frequently used only as a
spot check on the process during production, although they are used heavily in the process
development stage (in order to ensure a process is achieving a certain degree of planarization).
1.4 Summary
In this chapter, we outlined the need for chemical-mechanical polishing as a means for
dielectric planarization in the manufacture of integrated circuits, and described the importance of controlling the CMP process in a production setting. Process control in the semiconductor industry has progressed from basic statistical process control to run by run
37
feedback control systems. Various metrics and goals must be achieved in the CMP process, including: removal rate, within-wafer non-uniformity, step-height, total indicated
range, within-die non-uniformity, and wafer-to-wafer non-uniformity. A number of difficulties exist in controlling the CMP process, including the drift in the polish characteristics of blanket and patterned wafer performance metrics over time, and the challenge of
control given a mix of device types on the same tool.
In the next chapter, we turn our attention to controlling the CMP process using the
EWMA controller outlined in this chapter. We will discuss factors that are important to a
production control solution, including the use of integrated metrology and a simple model
update strategy. We will also begin to discuss some of the difficulties in effectively measuring, monitoring, and controlling patterned wafers in the CMP process.
38
39
40
Chapter 2
Control of a Single Device Using On-Line
Metrology
In this chapter, we take the first steps in developing a comprehensive control strategy
for the polishing of patterned wafers in CMP. The previous chapter outlined the basics of
process control, CMP, and the issues involved with controlling the CMP process. This
chapter aims to expand on these topics and begins to cover the details of implementing
CMP run by run process control in a production setting.
There are at least four major issues involved with an implementation of a run by run
control system for use in a production environment: quality, cost, flexibility, and ease of
use. This chapter outlines the use of an EWMA run by run control system with integrated
metrology to control the average post-polish thickness of patterned wafers. An integrated,
or on-line, metrology tool resides on the processing equipment and performs measurement
after the wafers are processed, but before they are unloaded from the tool. The frequent
measurements provided by integrated metrology, combined with proper controller tuning,
result in high quality control of the post-polish thickness. In addition, the automatic measurement of the post-polish wafers and the relatively simple control algorithm provide
maximum ease of use. The simplification of processing using on-line metrology and the
reduction in run to run thickness variation from improved control result in a reduced cost
for the CMP process. However, the methodology provided within this control framework
is limited in flexibility. We will demonstrate this in the following chapter, where we
41
present a comprehensive control framework based on the concepts presented in this chapter.
As stated above, our purpose is to demonstrate run by run control of the average postpolish thickness of patterned wafers using the relatively simple EWMA control algorithm
in conjunction with an on-line metrology system. We will demonstrate that such a system
provides quality control of the average thickness of a single site over multiple dies of a single type of patterned wafers, i.e. a single type of device, with an easy to use system that
reduces cost. First, we present a study of the quality and reliability of an on-line metrology
tool for CMP. This is necessary to insure the quality of our measurements before using
them for control. Second, we outline the cost benefits of an on-line metrology tool, used in
conjunction with a run by run controller. Finally, we demonstrate that the lot to lot drift in
a polishing tool may be eliminated by using the simple (i.e. maximum ease of use)
EWMA control algorithm presented in the previous chapter. We show that, when this relatively simple controller is correctly tuned and frequent measurements are enabled by online metrology, the controlled thickness using this simple control methodology is similar
to or better than that reported in the literature, including those using more complex
approaches.
2.1 Evaluation of On-Line Metrology for CMP
Before an on-line metrology tool can be used for process monitoring and control, the
repeatability and reliability of the tool must be assessed. In addition, some understanding
of how measurements from the on-line tool relate to those of current ex-situ metrology
tools is needed. In this section, we discuss the evaluation of a NovaScan 210 on-line
42
metrology tool from Nova Measuring Instruments, performed on an IPEC 472 CMP polisher.
As shown in Figure 2.1, the on-line metrology tool resides on the polishing tool. Processing begins by moving the wafer from the load station to the primary polishing table.
After polishing, the wafer is normally buffed on a soft felt pad with de-ionized water. After
the buffing, the wafer is loaded into the on-line measurement tool before being placed in
the unload station.
The measurement process for patterned wafers is shown in Figure 2.2. Standard exsitu metrology tools first physically align the position and orientation of the wafer before
performing any software alignment and measurement. However, the on-line tool does not
perform any physical alignment of the wafer, but only performs a software alignment routine to determine the location and orientation of the wafer. Once the position and orientation of the wafer are established, the measurement process proceeds to each specified die.
Within each die, searching begins with the alignment feature (a particular feature on each
Wa.0 F-eed
Sl rry Feed
Wafer
Load
Wafer
Load
On-line
Metrology
Figure 2.1: Polishing sequence with on-line metrology.
43
Wafer Load
8.1A
Software Wafer Alignment
4.3A
Die Alignment
Recognition
Measurement
Site
Recognition
o.5A
Repeat for Each Site
Repeat for Each Die
Figure 2.2: The on-line pattern recognition system.
type of wafer chosen by the user to orient the software with a specific die on the wafer).
Once the alignment feature is located, measurement proceeds to each site. An optional
alignment of each measurement site is performed and a final adjustment to the measurement position is made before the measurement of the site is taken. Measurement then proceeds to all remaining sites within that die. Once the measurement of the sites within the
die is complete, the tool then moves to the next die and begins searching for the alignment
feature of that die. The process then repeats until all dies are measured.
The measurement process is a spectraphotometry process, whereby the intensity of
light of varying frequency is measured to obtain a reflected spectrum. The spectrum is
matched to internal model spectra within the tool. The thickness parameters of the model
are varied, from which the thicknesses of the specified layers is determined.
44
2.1.1. Measurement Repeatability
There are four sources of variability in the measurement process: the variability due to
a) the initial wafer alignment software routine, b) the die stepping and alignment, c) the
site alignment, and d) the actual measurement process. These variances are in a form
referred to as nested variance. The variance of the measurement process is nested within
the variance due to the site alignment which is nested within the variance due to the die
stepping and alignment which is nested within the variance due to the software alignment.
This four-level nested variance structure results in a single sample site measurement having the form
Xijkl -
Wi+D()
+ Sk(ij) + Ml(ik)
(2.1)
where
a 2)
(22
(2.2)
,
aD2
(2.3)
), as2
(2.4)
MI(ijk) ~ N(O, am2)
(2.5)
Wi ~N(p,
Dj(j) - N(
Sk(ij) - N(g
1
This structure has several implications when measuring the variability of the measurement process. In fact, the structure in this general form is extremely complex, and many
works have outlined methods for finding the terms in this structure [50-51, 58-63]. If, for
example, we assume that g, 9i, and giU) are all zero, then the variance of an individual
measurement will be
OT2 =aW2 +
a|
=a
D2 +
+aD
+
s2 +m2
2
M
The variance of the sample average of a wafer, under these assumptions, would be
45
(2.6)
aA
2
=
aW+
1
IUD
2
2
1
2
+Jjas +DSaMG(.7
where D, S, and M are the number of dies, sites, and measurements, respectively. With
this in mind, we now proceed to identify the components of variance in the on-line metrology tool.
The variability of the actual measurement process, am2 , was estimated by repeatedly
measuring the same point on a single wafer, without any physical movement of the wafer
or mechanisms (site alignment or die alignment). This eliminates all the components of
variance except
MI(ijk).
This process was repeated at two locations on the wafer, each with
25 repetitions, to estimate am2 , which is often referred to as the "precision" of the measurement process. The average of the sample standard deviations of these measurements
was 0.5A, and is shown in its corresponding location in Figure 2.2. By averaging these
values, we are assuming that the variation of two sites at different locations are white,
meaning that each site has the same am2
The variation added by the site-to-site movement and the die-to-die movement cannot
be measured individually on this tool, because it cannot be set to perform multiple site or
multiple die measurements without including the software wafer alignment. The withinwafer measurement variation, i.e. that including a2, as , and am2 , was determined as follows. A single patterned wafer was placed on the measurement stage. One site on five die
across the wafer was measured. The standard deviation of the five sites was calculated for
25 repetitions, without movement of the wafer on the stage. It is possible that there is some
wafer variation included in this, because each repetition includes a software alignment.
However, we will neglect this because we believe most of the variation comes from varia-
46
tions in the orientation and position of the wafer on the measurement stage, and this was
not included since the wafer did not move on the stage during the measurement. The average of these 25 five-die standard deviations was determined to be 4.3A, and is also shown
in its corresponding location in Figure 2.2.
In our nested variance structure, each measurement we obtain is at the die-level. In
particular, each measurement consists of one measurement in one site in one die. Thus,
only the D](i),,
Sk(ij),
and MI(Jik)
terms contribute to the variance of each measurement.
Some assumptions on the means of these terms are necessary. Since we have only one
wafer, and since there is only one site and one die in each measurement, we can assume
one mean for all the measurements. Therefore, the average of five of these measurements
would have a variation of
(WA2
=
1aD
+(
5
)()as
+ ( 5 )(
)( 1 )2
=
a2+
as2+ am2)
(2.8)
In addition, since we have only one mean for all the components, these will drop out in the
calculation of the standard deviations. Thus the sample average of the standard deviations
given above is an estimate of the variation from the die variation, the site variation, and the
measurement variation,
DT22= aD
D22+ + aS2 + aM2
M2
cTDT
(2.9)
Note that we may have possibly compounding variations here. We are assuming that the
variation determined from the sample standard deviation of the die measurements is due to
the metrology tool, because we are using the same wafer which is very flat across the
raised features. In fact, if there is within-wafer variation on this wafer, then the standard
deviation of the five die measurements will contain this variation, and should be attribut-
47
able to the wafer, not the metrology tool.
The variability including the measurement, site-to-site, die-to-die, and wafer alignment variability was measured by repeating the previous process, but rotating the orientation of the wafer in-between each measurement. This average variability was determined
to be 8.1A, which is also shown in Figure 2.2. Again, we have assumed that the mean of
the wafers, pi, is fixed over all wafers, because the same wafer was used for all measurements.
We found the repeatability of the measurement process, i.e. 8.1A, to be very good,
considering that the average wafer to wafer variation of blanket wafer polishing is roughly
100A to 300A. The sources of variability are all less than 10,
and some of this may be
due to the wafer itself as mentioned above. This suggests that the repeatability of the
metrology of the tool meets the requirements for CMP. Section 2.1.3 will discuss the variation contributed by wafer loading, small amounts of slurry, and post-polish wafer clean-
ing.
2.1.2. Reliability
The reliability of a metrology tool in a production environment is extremely important.
In light of this, we performed two reliability tests of the on-line metrology tool. In our first
experiment, one site on five dies was measured on 100 wafers with the intention of testing
the alignment success rate as well as the die-level pattern recognition success rate. Our
experiment results show a 100% success rate in wafer alignment. Only one in 500 sites
was not found (most likely due to a bubble), corresponding to a 99.8% success rate. We
performed two additional experiments which measured one site in 22 dies on 24 wafers.
48
Again we found a 100% wafer alignment success rate. Four of 1056 sites were not found,
resulting in a 99.6% success rate.
The reliability of the on-line metrology tool during the run by run control experiment
we outline later was also very good. There was one failure in 96 wafer alignments, a success rate of 99%. The site alignment success rate was 99.7% (7 failures in 2112 measurements). These site-not-found (SNF) errors are generally caused by bubbles in the water
between the wafer and the measurement window. However, it was found that the pattern
recognition trained for this layout had problems finding a site on the far right of the wafer,
which could be due to an inability in the die-level or site-level alignment routines to compensate for inaccuracies in the stepping distance or the wafer alignment.
2.1.3. Correlation with Ex-Situ Metrology
We would also like to understand how these measurements correspond to ex-situ measurements. One site on 22 dies on two sets of patterned wafers were measured on both the
NovaScan 210 on-line metrology tool and on a KLA/Tencor UV1280 ex-situ metrology
tool. The first set consisted of pre-clean pre-polish wafers measured on both tools. The
second set were pre-clean post-polish wafers when measured on the on-line tool and postclean post-polish wafers when measured on the ex-situ tool. Care was taken to set up the
measurement parameters on both tools. These parameters include pattern recognition,
optical properties of the materials being measured, die stepping distances, and site measurement locations. Both the tools are spectraphotometry tools, and thus we expect similar
results.
The on-line and ex-situ measurements from pre-clean pre-polish wafers are linearly
49
correlated with a correlation coefficient of 0.98. The on-line values are, on average, 47A
higher than the ex-situ values. The standard deviation of the errors from this linear fit
(which we will refer to as the spread) is 12A, and the range of the spread is 48A. These
results show that the on-line measurements correlate extremely well with ex-situ measurements. The absolute thickness values obtained from each tool are slightly different, and
may be due to variations in algorithms, optics, and calibrations.
Our second set of wafers were pre-clean post-polish when measured with the on-line
tool and post-clean post-polish when measured with the ex-situ tool. The scatter plot for
the experiment is shown in Figure 2.3. Several values lie above and below the main cluster,
in addition to the few SNF errors. This phenomenon is called cycle-skipping, and is
caused by a failure of the algorithm to distinguish the spectrum of the true thickness from
that of a nearby thickness which has a similar spectrum. This problem was eliminated by
switching to a more optimal algorithm later in the experiment. The success rate during the
region with cycle-skipping was 83%. The success rate increased to 99.5% when the more
optimal algorithm was used.
50
-
12000
-
-
-
110001000090008000-
Cycle-Skips
0
0
1-
7000
6000
Site-Not-
5000
+
400
40 00
5000
7000
8000
6000
Ex-Situ Thickness
~Found
9000
10000
11000
(A)
Figure 2.3: Correl ation plot of the on-line and ex-situ post-polish patterned wafer
thicknesses.
As shown in Figure 2.3, the pre-clean post-polish on-line values are, on average, 175A
higher than the post-clean post-polish ex-situ values, and the values are linearly correlated
with a correlation coefficient of 0.99. The standard deviation of the spread is 31 A, and the
range of the spread is 173A. In order to determine the effect of cleaning, wafers were measured on the ex-situ tool, cleaned, and remeasured. The clean resulted in an offset of 137A,
a standard deviation of the spread of 8A, and a range of the spread of 28A. The sources of
the 175A offset are outlined in Table 2.1. We see that we have only 9A of unaccounted offset, which is within the variation of the measurements.
51
Experiment
Average
Pre-Clean Pre-Polish On-line Pre-Clean Pre-Polish Ex-situ
47A
Pre-Clean Pre-Polish Ex-situ Post-Clean Post-Polish Ex-situ
137A
Pre-Clean Post-Polish On-line Post-Clean Post-Polish Ex-situ
175A
(175) - (47+137) = -9A
Added Cleaning Due to Surface Damage
Table 2.1: Average differences between the on-line and the ex-situ measurements.
We are now in a position to extract the increased variation due to the effects of the
loading, residual slurry, and surface damage caused by the CMP process. We can calculate
this as shown in Table 2.2. If we assume independence of these variations, then we can
subtract the cleaning variation and pre-clean pre-polish variation from the post-polish
post-clean variation to obtain the remaining variation due to the combined effects of polishing, slurry, and loading. As shown in Table 2.2, the combined variation is less than 27A.
If there is correlation in these components, then this number could actually be significantly lower. We combine this result from those of Section 2.1.2 in Table 2.3 to summarize
our assessment of the variation in the on-line measurement process. These combine for a
total variation of only 28A, far less than the variation of the CMP process itself.
52
Experiment
Standard Deviation
Pre-Clean Post-Polish On-line Post-Clean Post-Polish Ex-situ
31A
Pre-Clean Pre-Polish Ex-situ Post-Clean Pre-Polish Ex-situ
8A
Pre-Clean Pre-Polish On-line Pre-Clean Pre-Polish Ex-situ
12A
Added Variation Due to
Polishing, Slurry, and Loading
312 - (82+ 122)
=
27A
Table 2.2: Variation added to on-line measurements by polishing, residual slurry,
and wafer loading.
Experiment
Standard Deviation
On-line Measurement Repeatability
0.5A
On-line Pattern Recognition
4.3A
On-line Alignment
6.8A
Added Variation Due to
Polishing, Slurry, and Loading
27A
Total Variation
0.52+ 4.32 +6.82+
272
=28A
Table 2.3: Breakdown of the variation in the on-line metrology system.
2.2 Throughput and Cost of Ownership Improvements
Before discussing the use of the on-line metrology tool in a control setting, this section
discusses throughput and cost of ownership (COO) improvements gained by the use of online metrology in CMP. These issues are highly dependent on the particular process implementation at a particular site. Therefore, we will discuss several scenarios and outline the
throughput and COO improvements gained in each scenario.
53
2.2.1. Throughput
While it is possible that on-line measurement could slow processing, this is mainly
when more than five dies are measured on a robot-less CMP tool (such as the IPEC 472).
Therefore, we will assume that the on-line measurement does not slow the polishing process. Much of the increase in throughput comes from a reduction in the number of cleans
and ex-situ measurements. The savings calculated here assume that the polisher waits for
the post-polish clean and ex-situ measurement before continuing to polish, which is unrealistic for some high volume facilities. Increases for these facilities will be largely dependent upon the number of cleaning and ex-situ measurement tools that are available relative
to the number of polishers, and which set of tools is the bottle-neck for the CMP process.
Our first scenario is outlined in Figure 2.4. This is a highest-quality lowest-throughput
process. It consists of a 10 minute look-ahead and pilot wafer polish, a 30 minute clean, a
5 minute ex-situ measurement, a polish time calculation, a 90 minute polish, another 30
minute clean, a 30 minute ex-situ post-polish measurement, rework time calculations, a 10
minute rework, a 30 minute clean, and a 10 minute two wafer ex-situ post-polish measurement. If we measure all wafers and re-work only those necessary, we obtain a total time of
255 minutes; a throughput of 0.23 lots/hour. If we measure only two wafers (10 minutes)
in the lot and re-work the entire lot (45 minutes), we again obtain a time of 255 minutes.
Utilizing on-line metrology, as shown in Figure 2.5, we eliminate the look-ahead clean
cycle, the ex-situ look-ahead measurement, the first post-polish clean, and the first ex-situ
post-polish measurement. If we re-work only those necessary, then we obtain a total time
of 142 minutes, a throughput of 0.42 lots/hour. If we only measure two wafers in the lot
and re-work the entire lot, we obtain a total time of 177 minutes, for a throughput of 0.34
54
10 Minutes
Polish Look-Ahead
and Pilot Wafers
Measure
Clean
30
Minutes
Calculate Polish
Time
Polish Lot
1
_1C0ea
iue
Measure
30 Minutes
90 Minutes
Rework Lot
2/24 Wafers
Clean
Measure
10/45 Minutes
30 Minutes
10 Minutes
Figure 2.4: Polishing with look-ahead wafers, rework, and ex-situ metrology.
Total time per lot with 2 wafer rework is 255 minutes; a throughput of 0.2353 lots
per hour. Total time per lot with 24 wafer rework is also 255 minutes.
12 Minutes
Polish Look-Ahead
and Pilot Wafers
f90
on-line
Measure
Calculate
Polish Time
Minutes
Polish Look-Ahead
on-line
Calculate
and Pilot Wafers
Measure
Polish Time
Rework Lot
2/24 Wafers
Clean
10/45 Minutes
30 Minutes
Figure 2.5: Polishing with look-ahead wafers, rework, and in-line metrology. Total
time per lot with 2 wafer rework is 142 minutes; a throughput of 0.4225 lots per
hour. Total time per lot with 24 wafer rework is 177 minutes; a throughput of
0.3390 lots per hour. These are improvements of 80% and 44%, respectively.
lots/hour. These correspond to throughput increases of 80% and 44%, respectively.
Similar calculations can be made for other scenarios. One might have look-aheads, but
55
no reworks. In this case, there is a 39% increase. If we have reworks, but no look-aheads,
then the increases are 23% to 54%, depending on the number of wafers measured and
reworked. Finally, one may have neither look-aheads nor reworks, and the increase in
throughput is only 8%. Generally, low quality high throughput processes will have smaller
increases in throughput, while high quality low throughput processes will have larger
increases in throughput. In some cases, on-line metrology can enable highest quality processing at throughputs of medium quality processing.
2.2.2. Cost of Ownership Reductions
Reductions in cost of ownership (COO) could arise in several areas: an increase in
throughput, savings in chemical and water usage, and equipment reduction for future facilities. We begin with a discussion of the throughput cases above. For our calculations, we
used a standard COO model for an IPEC 472 polisher. The specific dollar amounts for
each scenario are proprietary, therefore we quote the percent reductions in COO. The
reduction in COO could be substantially more or less, depending on whether CMP is a
bottle-neck process in the facility, or whether the CMP area operates in a very high volume
mode with significant parallel processing.
When we have look-ahead wafers, the 80% increase in throughput in the two wafer
rework case and the 44% throughput increase in the full lot rework case, result in COO
reductions of 31.6% and 21.8%, respectively. For the case with look-aheads with no
rework, the 39% increase in throughput results in a reduction in COO of 17.6%. It may be
that only a certain percentage of the lots are reworked. We can extract the cost savings by
multiplying the savings from the rework case by the percentage of occurrence and add this
56
to the product of the remaining percentage and the savings from the no rework case. A plot
of the COO reductions versus the percent rework is shown in Figure 2.6 for the look-ahead
wafer scenario. Here we see the savings will range from 17.6% to 31.6% for the two wafer
rework case and from 17.6% to 21.8% for the full lot rework case.
%U
35
Two Wafer Rework
0
30
25
0
0
20
-
- -- -
15
Full Lot Rework
10
50
0
20
40
60
80
100
Percent Rework
Figure 2.6: Percent reduction in the cost of ownership vs. percent rework for a
process with look-ahead wafers.
When there are no look-ahead wafers, there is a 12.6% to 23% reduction in COO,
depending on the number of wafers reworked. With no look-aheads and no rework, there is
a fixed 4.3% reduction in COO. If we have only a percentage of the lots reworked when no
look-ahead wafers are run, then the savings increases linearly from 4.3% to 23%, depending on the percentage of rework.
The second area for reduction in COO arises due to the reduced water, chemical, and
energy usage in cleaning. We can quantify these reductions by considering the reductions
in cleaning. In particular, if we do not run look-aheads or perform rework, then we have no
57
reduction in the number of cleans. However, if we have no look-aheads but run rework,
then the number of cleans is reduced by 50%. Therefore, if we have a percentage rework,
p, then our reduction in COO is p*50% multiplied by the cost of cleaning. Note also that
these savings are independent of whether the facility runs in high volume (parallel polishing) because we still have to clean and measure the wafers in order to decide on reworking
them. We can repeat these calculations for the look-ahead case. We can eliminate 50% of
the cleans if we have no rework. However, Figures 4.4 and 4.5 point out that the savings
increases to 66% if we have rework. Therefore, if we have a rework rate, p, then our reduction in cleaning is r = p*( 6 6 .7 %) + (1-p)* 5 0%; somewhere between 50% and 66.7%. Our
reduction in COO is then r times the cost of cleaning. In addition, we see from these calculations that we would also benefit the environment substantially by reducing water, chemical, and power usage by 0% to 66.7%.
The third area for COO reduction is that of tool purchasing for future facilities. We
saw from the above calculations that the number of necessary cleaning tools could be
reduced from 0% to 66.7%. Thus, the reduction in COO for the future facility could be as
much as 0% to 66.7% times the COO for each cleaning tool. In addition, one could estimate the reduction in the number of ex-situ metrology tools. In the scenarios above, we
could eliminate one to three ex-situ measurements per lot, depending on the level of processing quality. The actual costs may vary, but we assume here that the on-line tool cost is
roughly one fifth the cost of an ex-situ measurement tool. If we have one ex-situ tool for
every four polishers, then we could reduce our COO by as much as 20%. This number
may be an exaggerated, because we can not eliminate all the ex-situ tools.
These reductions in COO constitute only those tangible reductions obtained from the
58
use of on-line metrology tool. Further reductions in COO due to improved process monitoring and control are also possible. However, it is difficult to quantify these improvements. For example, improved CMP process monitoring and control will lead to less wafer
scrap, less rework, and higher yields; all of which may have a substantial impact on COO.
Lower process variability may enable new processing methodologies. For example, a
tighter control of post-polish oxide thickness would decrease the required amount of
deposited oxide, decreasing the deposition time, which in turn, would reduce chamber
clean time. This would increase throughput and reduce chemical and energy usage for the
deposition step.
2.3 Run by Run Control of CMP with On-Line
Metrology
We now demonstrate how this on-line metrology can be used in conjunction with a run
by run process control strategy to improve CMP processing. Similar work has been published [4-5,7,9-12] for controlling blanket wafer removal rate, but as we will show, this is
different than controlling directly on patterned wafers. Direct control of patterned wafers
using this on-line metrology tool was shown in [13-15]. A multivariate non-linear controller which monitors the average removal rate and within-wafer uniformity was shown in
[14], and a self adjusting control algorithm to control the average post-polish thickness
was shown in [15]. We will demonstrate that the use of the relatively simple EWMA controller, when used in conjunction with on-line metrology, is more than sufficient to remove
the time trends (e.g. drift) of the removal rate in the CMP process. One reason for this is
that the dynamics of the CMP process are not very complex., and controllers with more
59
complex filtering methods (e.g. that shown in [15]) are not needed. In fact, all of these
methods achieve quality control of the average thickness, but this is due to the increased
sampling frequency gained by the on-line metrology, not by the complexity of the control
algorithm. We will also show that, when proper pad conditioning methods are utilized, the
within-wafer uniformity is very stable. Therefore, it is not necessary to control the withinwafer non-uniformity of the CMP process (for the cases examined here), as suggested by
[14], and doing so will only increase the complexity of the controller and increase the variability of the controlled thickness. We will show here that a much simpler controller, i.e.
control using only time adjustments and the relatively simple EWMA algorithm, performs
extremely well. As a result, we can achieve our needs for high quality by using this simpler approach, which also serves to increase the ease of use over more complex control
algorithms. However, the control approach taken in this section has several limitations in
flexibility. In particular, it controls only a few sites of a single device. This section will
point out one problem associated with this approach to control, and set the stage for the
control of multiple devices that we will address in the next chapter.
2.3.1. Experimental Setup
In the 600 wafer control experiment described below, we simulated the polishing of
one lot of patterned wafers by using five blanket "filler" wafers, one blanket "prime" pilot
wafer (which was used to monitor the blanket removal rate and uniformity in each lot),
and one patterned wafer (the last wafer in the lot was used as a lot-based monitor wafer).
The experiment was performed on an IPEC 472 polisher, with a primary polish step on an
IC10OO/SUBAIV pad stack followed by a 30 second buff with de-ionized (DI) water on a
60
felt pad. Measurements were taken on the on-line metrology tool. All filler wafers and
pilot wafers were measured with a 16 point blanket oxide measurement recipe. The patterned wafers were short-flow wafers consisting of a level-one metal layer, followed by a
Titanium-Nitride (TiN) barrier layer, a Silicon-Oxy-Nitride (SiON) anti-reflective layer,
and a PETEOS inter-level dielectric (ILD) layer of a test device. The pattern recognition
was trained to measure one site on 22 dies on the wafer.
2.3.2. The Run by Run Process Control Algorithm
The control algorithm uses an exponentially weighted moving average (EWMA) to
monitor the average removal rate. This averages past values of the rate to obtain a filtered
version of the removal rate, which is weighted to more closely approximate recent values.
The EWMA estimate of the average removal rate is updated once per lot, and is used to
calculate the polish time for the next lot. The average removal rate on run
r[n] = Tpre[n] - T,,,,[ n]
t[n]
n
is given by
(2.10)
where T ,,[n] is the average thickness of the wafer prior to polishing, T,,,,[n] is the aver-
age post-polish thickness, and
t[n]
is the polish time on run n. We then compute an expo-
nentially weighted moving average (EWMA) of the average removal rate
a[n] =
where
w
w.r[n]+(1-w)
a[n-1],
(2.11)
is the EWMA weight. The controller used during this experiment utilized an
EWMA weight of 0.6, and a[0] = 5200A/min. The EWMA weight was determined from
historical data, as in [11-12,20]. The EWMA average is used to determine the polish time
on the next run
61
t[n + 1] = Tre[n + 1] - TDesired
a[n]
(2.12)
where TDeied is the desired post-polish thickness. The desired thickness was 8000A.
2.3.3. Patterned Wafer Control
The patterned wafer control over the 600 wafers is shown in Figure 2.7. The control
run was started with a polish time based on the preliminary estimate of the polish rate
given above. This was a poor estimate of the actual rate and the controller responded rapidly. The root mean squared error (RMSE) of the controlled thickness from the desired
value after a four lot break-in period was 121A. The RMSE of the uncontrolled (fixed polish time) case was estimated to be 245A. After removing the cycle-skipping, the RMSE
was 97A for the controlled thickness, and the RMSE of the uncontrolled case during this
region was estimated to be 250A. This level of control is equal to or better than any other
reported CMP control using more complex filtering algorithms. In addition, we can see in
12000
4 Lot Break-in
Period
100009000 -
:E 800000
7000-
-1w,
=97A
600-4RMSE
50004000E 400
30002000
=121 A
-RMSE
RMSE
10
20
30
40
=280A
50
60
70
80
90
Run #
Figure 2.7: Controlled average post-polish patterned wafer thickness over the
600 wafer experiment.
62
Figure 2.8 that the patterned removal rate has very slow dynamics; large changes generally
occur over several (e.g. 20) runs. The integral (summation of past errors) filtering provided
by the EWMA algorithm is more than sufficient to monitor and control these dynamics.
More complex filtering methods would only serve to increase the variability of the process
by responding to the noise in the removal rate seen in Figure 2.8.
000
7500a)
cc
7000 -
4'
0
E
10
6500
4) 6000
0L
5500-
10
20
30
50
40
60
70
80
90
100
Run #
Figure 2.8: Average removal rate of patterned PETEOS wafers.
Figure 2.9 shows the uncontrolled within-wafer variation of the amount removed during the control experiment. We see here that the within-wafer non-uniformity is extremely
well-behaved, consistently around 5% (which is typical for this tool), over the course of
the 600 wafer (96 lot) experiment. We can clearly see that there are no dynamics in the
within-wafer non-uniformity, and controlling on this metric would only result in increasing the noise in the controlled thickness.
We now compare this approach with that of controlling the final film thickness on patterned wafers using blanket wafer removal rates and a sheet film equivalent (SFE). This
approach uses historical data to determine an SFE, which is the ratio of the patterned and
63
30
E. 250
E
0
20-
>
0
15-
z
E
100
E5
'
10
20
30
40
50
60
70
80
90
100
Lot #
Figure 2.9: Within-wafer non-uniformity of blanket sheet film PETEOS wafers.
blanket wafer removal rates. The operator multiplies the blanket rate from a pilot wafer by
the SFE to estimate the patterned rate, and uses this to calculate the polish time. Since we
had the actual patterned rates, as well as the blanket rates, we determined what the control
would have been using this process. The SFE was calculated for each lot, and the control
results were determined for each of these values, as well as for the average SFE. The best
result for this approach is shown in Figure 2.10. The RMSE was 138A, 14% worse than
direct patterned wafer control. The control using the average SFE resulted in an RMSE of
162A, a 39% decrease in performance.
We also tested the control using a subset of five of the 22 dies used during the actual
experiment, and estimated the controlled thickness using only these five dies. The resulting control, shown in Figure 2.11, had an RMSE of 280A, an increase of 130% over the 22
die control. During the regions where there is no cycle-skipping, the RMSE was 150A, a
78% increase. Even though the five die average was controlled to the 8000A target, the 22
64
12000
11000
10000
:E
VA
9000
8000
7000
0
a)
6000
5000
4000
4-
4 Lot B reak-in
-I
-I
=138A
-RMSE
-
RMSE =156A
RMSE =239A
3000
10
20
-
50
40
30
60
--
70
80
90
Run #
Figure 2.10: Average post-polish thickness of patterned wafers using pilot wafers
and sheet film equivalents to control the post-polish thickness of the patterned
wafers.
12000
11000
5 Site Average
10000
9000
8000
(A
(A 7000
c
RMSE =149A
4
-
6000
4
22 Site Average
RMSE = 280A
4000
RMSE = 528A
3000
--
01nnn
10
20
30
40
50
60
70
80
90
Run #
Figure 2.11: Controlled average post-polish patterned wafer thickness over the
600 wafer experiment using five measurement sites.
die average was roughly 1000A lower. Thus, using this type of control, sampling a
reduced number of dies can substantially reduce the quality of control.In the next chapter,
65
we will see that this lack of control using a reduced number of dies is the result of monitoring and controlling only one site with a poor understanding of how the patterned
removal rate of that particular site being measured is affected by the blanket rate and the
specific pattern layout of the device being controlled.
2.4 Summary
The increased sampling frequency provided by on-line metrology allows a run by run
controller with a simple filtering algorithm to effectively compensate for the dynamics of
the CMP removal rate. The variability of on-line CMP metrology tools is now well below
CMP requirements, the reliability is very high, and the measurements from these tools correlate well with measurements from ex-situ metrology tools. In addition, the use of these
tools could lead to substantial reductions in cost of ownership through increased throughput, decreased cleaning, and decreased need for more expensive ex-situ metrology tools.
These benefits make the use of on-line metrology ideal for run by run process control.
Control of the post-polish average film thickness for a few sites of a single device using
on-line metrology provides excellent control results with a simple EWMA controller.
Because the dynamics of the CMP process are relatively slow, complex control methodologies are not necessary to control the CMP process. In addition, the within-wafer variation
is very stable over extended periods of polishing, and control methods which attempt to
control using this metric are likely to only add variability to the process.
However, this experiment suggests that controlling the average thickness with a small
number of measurement sites or a small number of measurement dies may result in
increased variability due to an increased sensitivity to measurement errors as well as con-
66
trol of the process to an inaccurate average value. In the next chapter, we will explain why
these effects occur and outline a framework which reduces this sensitivity. This experiment also indicates that performing control using pilot wafers would cause a 39% increase
in the average error of the controlled output over performing control directly on patterned
wafers. The next chapter will also explain this effect and discuss why such approaches
result in poor control of the CMP process.
67
68
Chapter 3
Control of Multiple Devices in Dielectric
CMP
The previous chapter demonstrated that a simple filtering algorithm is effective in controlling the time dynamics of the CMP process. In particular, the average post-polish
thickness measured at a few sites in a few die of a single device was controlled to within
100A. While the control scenario outlined in the previous chapter is the one typically discussed in the literature, production fabrication facilities often face a much more difficult
task. Fabs generally have more than one type of device running through the facility, and
this has a significant impact on the CMP process. Therefore, the control approaches used
in production are often very different than that described in the previous chapter and in the
literature. Control approaches used in production seem to go in one of two directions. One
approach is to maintain a separate controller, e.g. like that outlined in the previous chapter,
for each device being run through the CMP process. The second approach attempts to tie
the control of the separate devices together using a common parameter, such as the blanket
removal rate. Each approach has its advantages and disadvantages. The first approach has
the advantage that measurements from one device do not degrade the control of the other
devices, because it does not use an inaccurate relationship between the polishing rates of
the different devices and a common parameter. However, the second approach has the
advantage that changes in the process which are common to all devices, e.g. blanket rate
changes due to pad wear, can be estimated from any device being processed and passed on
69
to the control of the other devices. For example, estimating the blanket removal rate from
measurements of the patterned removal rate of one device, and using this updated blanket
removal rate to estimate the time needed to polish another device, allows changes in the
blanket rate which occurred during the polishing of the first device to be taken into
account when determining the polish time for the second device. This is extremely important if one device has not been run in a long time, and the performance of separate controllers can be poor when some devices are not run regularly.
In this chapter, we address the critical issue of multiple device control. The previous
chapter demonstrated that controlling the process using a reduced number of die, or using
estimates of the patterned removal rate based on measurements of the blanket removal
rate, results in a decrease in the quality of control. We begin this chapter by discussing the
importance of pattern dependencies in CMP. Following this, we describe in detail a CMP
process model that has been well-received in the CMP community, because it has been
shown to be relatively device independent. We then show that this model provides insight
the problems seen with the control solution in Chapter 2, and with other solutions in the
literature, are the result of poor assumptions regarding pattern dependencies. In addition,
we outline additional problems that arise in the case of multiple device polishing. We then
define the multiple device control problem addressed in this chapter, and demonstrate how
this model can be incorporated into a CMP control system to provide a framework for a
device independent control strategy for the CMP process. Finally, we present experimental
results to demonstrate the effectiveness of this approach.
70
3.1 Pattern Dependencies in Dielectric CMP
We would like to understand the problems seen with the control approach outlined in
Chapter 2. As we will see, these problems are mainly due to variations in the pattern
dependencies in CMP that were neglected in the set up and design of the control strategy.
This neglect is typical in many CMP monitoring and control scenarios, and this section
aims to demonstrate why this leads to poor control of the CMP process. Consider the postpolish thickness profile of an entire patterned wafer shown in Figure 3.1. Here we see several effects. First, there is a smooth surface with deep valleys carved inside. The smooth
surface is the result of wafer-level variation. The sharp valleys are the thicknesses of
regions within each die, and are caused by the pattern dependencies in polishing. An
example of such pattern dependencies for a single die is shown in Figure 3.2. Here we see
the post-polish thickness profile of a test device containing areas of memory and logic. If
we uniformly sample the entire die at several locations, we will get a complete picture of
the die-level non-uniformity, i.e. similar to that shown in Figure 3.2. However, sampling
2.2-
XY
(mm)
Figure 3.1: A very densely sampled thickness profile of a typical wafer, including
wafer-level and die-level variation components.
71
0
8000
00%7500,.70000
C6500 -
:E
6000,..
5500
5000
20-20
15
15
X11
5
Y (mm)
5
0
0
Figure 3.2: Die-level thickness profile of a test device.
several sites on several die is much too time consuming in a production environment, so
we are often forced to choose one, two, or maybe three sites to monitor on typically five to
nine die.
In order to understand the importance of this die-level variation, we have shown the
sources of variation of the CMP process in Figure 3.3. The wafer-to-wafer data shown on
the left is the average of the controlled thicknesses measured at 22 sites across the patWithin-Die
Within-Wafer
Wafer-to-Wafer
7M00
7M00
7=0
-7000
70007M
6500
10
20
20
4
0
..
Wafer #
70
6
0 00
1
2
3
4
5
6
7
.
1
1.
.
Die #
Figure 3.3: Sources of thickness variation in the CMP process.
72
.
1
10
Site #
20
25
terned wafers in the 600 wafer experiment described in Chapter 2. The within-wafer data
is the average thickness of 25 sites, plotted for 10 die across the surface of one wafer. The
within-die data is the thickness of each site, averaged across 10 die, plotted for each of 25
sites. The range of the wafer-to-wafer variability is approximately 500A. The within-wafer
variation is also approximately 500A. On the other hand, the within-die variation seen on
the right is greater than 3000A, six times larger than the other two sources of variability.
In fact, the within-die variation is the largest contributor to the global non-uniformity
seen on a post-polish wafer. This can be seen in Figure 3.4. Here the total variation was
calculated as the standard deviation of 250 measurement points, 25 sites measured uniformly across each of 10 die uniformly spaced across the wafer. The within-die variation
was calculated by taking the standard deviation of the 25 sites within each die, and averaging over the 10 die. The within-wafer variation was calculated by calculating the standard
deviation of the each over the 10 die, and averaging over the 25 sites. This was done for
900
800
-
-
Total Variation
700-
.
600,
Within-Die
Variation
s500C
Be
U)
400-
Within-Wafer Variation
------..
300200
10
20
40
30
50
60
Polish Time (Seconds)
Figure 3.4: Total, within-die, and within-wafer thickness variation of a typical test
device as a function of polishing time.
73
several different polishing times. We can see that the magnitude of the within-die variation
is nearly as large as the total variation. As a result, the profile of the total variation as a
function of the polish time is almost entirely determined by the profile of the within-die
variation.
Because the within-die variation is the largest contributor to global non-uniformity,
any control technique that does not take this into account is likely to have problems.
Before we address the problems with the control outlined in the previous chapter, as well
as potential problems with other existing techniques for controlling the CMP process, we
would like to understand what causes the pattern dependencies in the polishing process.
The next section describes a CMP model, which will be used throughout this work, that
will explain how these pattern dependencies come about.
3.2 Modeling of Dielectric CMP
Before we dive into discussing the problems of the control used in the previous chapter
and the many issues involved with the control of multiple devices, we need to build our
fundamental understanding of the CMP process. One way to do this is to consider the state
of the art in CMP modeling. We will not duplicate existing works by thoroughly reviewing
CMP models; such reviews can be found in [54,56]. Instead, we will focus on a few particular models, and discuss one model that has received a large degree of acceptance in the
CMP community in detail.
Work developed by Preston in 1927, related to the glass polishing industry, outlined
the fundamental mechanical motions relating to CMP [64]. The basic premise behind this
model is that the removal rate is equal to some coefficient (Preston's Coefficient) times the
74
pressure on the surface times the relative velocity
dz = -k pv.
dt
(3.1)
Preston's Coefficient is used to capture effects such as the chemical being used to break
down the bonds at the wafer surface, the surface roughness or structure of the polishing
pad, and the type of abrasive particles used. Several other models exist which attempt to
better explain the behavior of blanket wafer polishing, but most of these are neither physically substantiated nor significantly better than Preston's equation.
The issue of patterned wafer modeling, on the other hand, is much more complex. Several models have been proposed to account for pattern effects in CMP but they have limited applicability (see [54,56] for a detailed review). Their limitations range from being
based on non-representative test structures to using such a small range of process conditions and layouts that they are rendered ineffective beyond the scope of the original experimental conditions. Other limitations, especially for phenomenological models, are the
complexity and difficulty of relating the model parameters to physical factors. Most of the
models with the exception of those proposed by Renteln [65], Hayashide et al. [66] and
Stine et al. [53], do not apply across a whole die.
The integrated characterization and modeling methodology developed at MIT, or the
"MIT density model," accounts for pattern effects across the whole wafer in a systematic
way [53,56]. This density model provides a first-order approximation of post-polish
dielectric thicknesses for arbitrary layouts [56]. The model is based on Preston's equation.
Preston's equation states that the removal rate is proportional to the pressure times the
velocity. The density model assumes that the force is distributed evenly among the raised
features, see for example Figure 3.5. When we are polishing a blanket wafer, the entire
75
down areas
up areas
Z1
z > z0-z1
ZO
Oxidez<z-1
Metal
Figure 3.5: Cross-sectional view of the oxide thickness in a patterned wafer.
surface of the wafer supports the force. However, when we have patterned features, the
pressure is inversely proportional to the density of the raised, or up, features. Therefore,
Preston's equation can be re-written as
dz=-k
dt
vv
K
Pefx, y)
(3.2)
where K is the blanket removal rate, and pef/x, y) is what is called the "effective" density
of the region. We will explain effective density momentarily, but for now let us assume
that it is the percentage of the area occupied by the raised regions (lines) versus the down
regions. The model actually assumes two polishing regimes. In the first regime, the model
assumes that the polishing pad is entirely supported by the raised features, and thus there
is no removal in the down areas. In the second regime, i.e. once the step height (zI in Figure 3.5) is completely removed, the model assumes the removal rates of both the raised
and down areas are equal to the blanket polishing rate. The model assumes that there is an
instantaneous change in the removal rate once the step height is completely removed. In
other words, the model assumes there is an instant change in the density, from the initial
effective density of the feature, pefix, y,L), to a density of one. Thus the effective density
76
is dependent on the remaining thickness
p(x, y, z,L)
z
= {f/YL)
Pefl~x, Y, L)1
PY (,ZL)
0
z1
(3.3)
z > zo - zi(3)
z <Z-Z
where zo is the initial oxide thickness, z1 is the as-deposited step height, and zo-zl is
equivalent to the thickness of the down region when the step height has been removed.
This results in an oxide thickness which is given by
Kt
z(t)
=
JZO
pefx,y,L)
Z -Zi - K(t - t)
1
(3.4)
t>tl
where
t p
=
x, y, L)z
K(.
(.5
is the time at which the step height is removed. A profile of the removal rates of the raised
and down areas for one particular effective density, as a function of time, is shown in Figure 3.6. We can clearly see the change from the first regime of polishing to the second
regime. The time of planarization (where the polishing rate switches to the blanket rate) is
highly dependent on the density. This can be seen in Figure 3.7, where the raised area
removal rate is shown for several different effective densities, as a function of polish time.
The resulting raised area thickness profile is also shown. Note that, during the first regime
of polishing, the model asserts that the amount removed is proportional to the polish time
and inversely proportional to the pattern density. This is the dominant effect in the polishing of patterned wafers, and results in the large differences in the thicknesses for different
density features seen in Figure 3.7. We will return to this notion several times, particularly
in Section 3.4, where we describe problems with existing control techniques.
77
a)
2-4
40
a)
a)
0
D
300 200 --
0
E
1000
0
20
40
60
80
100
40
60
80
100
0.
100
a
a)
a)
a
80 6040-
E
20n
0
20
Polish Time (Seconds)
Figure 3.6: The MIT density model predictions of the removal rate of the up and
down areas as a function of time for one particular density.
50(
40-
10% Density
400
300
30% Density
200
C.
0
E
50% Density
100
----70% D ensity
OL
0
20
40
60
-
80
100
80
100
Polish Time (Seconds)
1. 9
a)m
a)
a)
1.4
a)
a)
1~
1.3
0.
1.2
-
10% Density
I--
1.1
0
20
40
60
Polish Time (Seconds)
Figure 3.7: The MIT density model predictions of the up area removal rates and
thicknesses, as a function of time, for different densities.
78
We now turn to explaining what is meant by "effective" density. In the examples
above, we referred to the effective density as the density of the features on the layout. Feature density is generally determined over a small region, e.g. 0.1 km. However, in the
CMP process, the "effective" density seen by the pad, meaning over which the pressure is
distributed, is significantly larger than this. Therefore, the effective density is determined
by computing a weighted average of the feature densities within a much larger window.
The size of this window is referred to as the planarization length. The weighting function
used is based on the flexing properties of a surface under the pressure of a localized force
2
-- 2sin 20dO
k2
r<a
0
w(r,L)=
-_T
T
2
2
(k2-r)
1-- Lsin 2d94r
0
1
1-L
4r2
1
L
do
L2
2
2sin 0
S4r 2
(3.6)
r>a
where
k
4(1-v2 )q
(3.7)
and where r is the radial distance away from the point (x, y) of interest, L is the planarization length, and q is the load,
v
is the Poisson ratio, and E is Young's modulus of the pad
material. The value of k2 is generally left out, and the weighting function is normalized to
have a peak magnitude of one. A cross section of this two dimensional spatial weighting
function is shown in Figure 3.8.
79
L
Figure 3.8: A cross section of the elliptical weighting function used in the density
model to calculate the effective density of the features.
The modeling process is summarized in Figure 3.9. The model is centered around the
density dependent removal rate for each location on the wafer. The blanket rate, K, can be
determined from the removal rate of a blanket wafer, measured at a location near the die of
Patterned
Blanket
Removal Rate Profile
A
Removal Rate
K
_
_
eff(X Y, L)
Effective Density Profile
Layout File
Figure 3.9: A high-level view of the MIT density model.
80
interest. This blanket rate is used for the entire die. Within the die, the effective density for
each point of interest is calculated using equation (3.6). This can be done for many points
within the die to generate an effective density profile of the entire die. The effective densities can then be used to generate the patterned removal rate profile via equation (3.2), and
in turn, the final thickness profile of the entire device from equation (3.4).
This model's key strength is its ability to efficiently predict the thickness of an arbitrary layout to a first order. This benefit comes from the weighted average of the densities
within the planarization length, or interaction distance. This weighted averaging is the key
to the model, because the pressure distribution of force on a particular feature is affected
by neighboring features. The planarization capability of any particular process is captured
by the planarization length parameter and the blanket rate profile. Given these parameters,
the evolution of the thickness on an entire wafer may be obtained using this analytical
model.
We can see the quality of this approach by testing the model fit to polishing data. This
was done for data measured on the test pattern shown in Figure 3.10. The dots indicate the
X (mm)
Figure 3.10: Measurement plan of a test layout pattern (Device #2).
81
60 locations measured on the die. The measured thicknesses, as well as the model fit, are
shown in Figure 3.11 for both the raised features, as well as for the down areas between
these features. We can see that the model does an excellent job fitting the wide range
13000
Measurements
12000
.011000
C
10000
9000
V4)
cc
M
Model Fit
8000
'I
4
7000
.4
6000
-. 4
t~I
RMSE = 220
0
10
30
20
40
50
60
Site #
1.7
X 10
1.68
1.66
1.64
0
SMeasurements
1.62
0
0
a
.2
1.6
I-
1.58
-* i
1.56
Model Fit
-
1.54
1.52
.1C
RMSE = 244
0
10
30
20
40
50
60
Site #
Figure 3.11: Measured and modeled values for the post-polish thickness of the
raised and down areas using the MIT density model.
82
(6000A) of thicknesses. In both cases the fit error is less than 250A. Similar results have
been obtained for this model when used for prediction. The results shown here are for a
single die on the wafer.
In addition, the MIT density model can predict the spatial evolution of an arbitrary layout through time, as shown in Figure 3.12. Here the thickness profile of the test patterned
wafer is plotted for four values of time. The stars are the experimental values and the lines
are the predictions obtained from the MIT density model. Once again we see that the
model provides an excellent fit to the polished wafer data.
14000
Seconds
*19
14000
-12000
0Wj
'41
Scn
8000 -
90 Seconds
S4.~
*4
6000
$
4000 -
2000'
0
Seconds
4
10
30
20
40
50
60
Site #
Figure 3.12: Measured and modeled values (dashed lines) for the post-polish
thickness of the raised areas for several polish times using the MIT density model.
3.3
Problems With Existing
Dielectric CMP
Control Methods
in
Now that we have outlined the importance of pattern dependencies in CMP, and provided insight to the cause of these dependencies through the study of the density model,
83
we are in a position to better understand the problems with the control strategy of the previous chapter, as well as other approaches outlined in the literature. Consider the sampling
plan used for control in the previous chapter. One site was measured on 22 die across the
surface of the wafer, and the average of these measurements was taken to determine the
average post-polish thickness. This assumes that the patterned removal rate can be modeled as a single value, regardless of the position on the wafer or in the die. It does not contain any information about the interaction between the pattern layout density at the
location within the die and the blanket polishing rate profile across the wafer. Even though
there is little variation in the blanket wafer profile, as shown in Figure 3.3, equation (3.2)
tells us that the polishing rate of the sites are initially equal to the product of the blanket
removal rate and one over the effective density of the measurement site. For a single site,
the effective density of the site exaggerates the blanket wafer profile, as shown in Figure
3.13. When we reduce the number of sites from 22 to five, we obtain fewer points on the
3Xi0
4
2.5
E
2
0
E
1~50%
20% Density Feature
Density Feature
Blanket Rate Profile
-100
-50
0
50
100
Radial Distance (mm)
Figure 3.13: Blanket wafer removal rate profile and patterned wafer removal rate
profiles over the surface of a wafer predicted by the density model.
84
outside of the wafer. Since the outside points are thicker, the 22 site average thickness is
lower than the five site average. While this difference in the average thickness is small for
a blanket wafer, it may be large for a patterned wafer. As shown in Figure 3.13, the difference will depend on the density of the site being measured. In addition, the division of the
blanket rate by the effective density results in an increase in the noise in the patterned
wafer thickness measurements (in the first polishing regime), due to blanket rate variations
V(PR)
=
V(BR) 2
(3.8)
(p0 (''M
where V(PR) is the variation (variance) in the patterned removal rate, and V(BR) is the
variation, i.e. wafer-to-wafer variation, in the blanket removal rate. Both of these factors
lead to a decrease in the controlled results when measuring a single site on only a few die
on the wafer.
Another problem with the control approach used in the previous chapter is that the
control is focusing entirely on the run by run time dynamics of the CMP process. In other
words, blanket wafer characteristics such as blanket wafer removal rate and uniformity are
the parameters being controlled. While wafer-level performance metrics, e.g. average
removal rate and wafer-level non-uniformity, are highly monitored, the device dependencies are often neglected. The result of this neglect is that device variability can be large. In
particular, the quality of control is much more sensitive to the choice of the measurement
location than to the wafer-to-wafer variation or the within-wafer variation (see Figure 3.3).
For example, if we measure around the peak in Figure 3.2 and correctly control this point
to the desired average thickness, then we have actually over-polished most of the die. On
the other hand, if we pick the valley in the center and control this point, then we actually
85
have under-polished most of the die. The amount of this over-polish or under-polish will
be on the order of 1000A, even though the wafer-to-wafer and within-wafer variation are
tightly controlled. One approach might be to pick the peak, valley, and mid-point of this
profile. In reality, most engineers do not know what the polishing profile of the device
being controlled looks like ahead of time. As a result, they may randomly pick a few
points to measure and control. Or the engineers may be asked to measure and control the
features which are generally most difficult to planarize. This is particularly damaging,
because this normally results in the choice of the extremes in Figure 3.2 being chosen for
control. In addition, there are manufacturing issues that keep certain areas from being
measured. For example, the valley in the center of this device is caused by a region of low
density. According to equation (3.2), a region of low density polishes very fast, leaving a
very thin post-polish thickness. These features are often very small, and properly measuring them in a reliable way on a metrology tool, on-line or ex-situ, is extremely difficult.
Therefore, these areas are rarely monitored in a manufacturing environment. Therefore,
typical control strategies will tend to monitor and control higher density regions, which
are thicker. As a result, wafers are generally being over-polished.
Traditional control approaches that monitor and control only a few sites also have no
ability to estimate the range of the true within-die variation. The within-die variation of a
particular device determines many things, including the performance of the circuit and the
necessary amount of deposited dielectric material in order to achieve planarization without
over-polishing through to the underlying substrate.
86
3.4 Current Methods for Controlling Multiple Devices in
Dielectric CMP
We have now seen how the neglect of pattern dependencies can lead to serious problems in the control of the polishing of a single device. We now turn our attention to the
control of multiple devices being polished on a single polishing tool. We will look at how
multiple devices are currently controlled in a production system and outline problems with
these approaches before presenting our solution to controlling the CMP process with multiple devices. One approach to multiple device control, shown in Figure 3.14, is one in
which control is performed using blanket test wafers and sheet film equivalents (SFEs). In
this scenario a test wafer is run and the blanket removal rate and non-uniformity are determined. A sheet film equivalent (previously found through experimentation) for the particular device being run is used to calculate the polish time for that device as a function of the
Polish Blanket
Test Wafer
Use SFE to
Calculate New Time
Measure & Rework
Lot
Time?
N
lp
No
NNo<
Idle
Polish
Lot
New
Device?
Yes
Update
Time
_
Use SFEs to
Update Time
I Yes
Figure 3.14: Example current practice for CMP process control using sheet film
equivalents (SFEs).
87
blanket wafer removal rate. Various methods are used for this procedure. One method simply multiplies a constant factor (one for each device) times the blanket rate to obtain the
patterned removal rate for that device. Another approach uses a fixed amount of time
required to planarize the features on that device plus or minus the blanket removal rate
times a change in time to meet the desired target thickness. Once the time for the device is
determined, the lot is polished. One or more wafers in the lot are measured, and, if necessary, one or more wafers in the lot are reworked, i.e. briefly polished again a second time,
so that they meet the desired thickness. If the tool has been sitting idle when a new lot
arrives to be polished, this procedure is normally repeated. Otherwise, if the lot is the same
device, then the polish time is tweaked using the measurements from the rework stage. If
the lot is a new device, the same estimate of the blanket rate is kept (or modified from
measurements in the rework stage) and the polish time for this new device is calculated
using the SFE for the new device.
If we are trying to use this approach to control two devices with post-polish thickness
profiles like those shown in Figure 3.15, we will have several problems. First, approaches
like this will suffer from all the same problems we outlined for the single device controllers in the previous section, i.e. they do not take into account the variation in the post-polish oxide thickness due to the pattern dependencies. As a continuing example of the
problems with controlling the average post-polish thickness using only a few sites in the
die, consider the differences between the three site and 63 site averages for the two devices
shown in Figure 3.16. Here we see that there is a large difference between the two averages. In addition, the differences in these averages vary depending on the particular
device. Therefore, if we utilize the control approach outlined above, which utilizes SFEs
88
6)
12000
0
10000-
0
I
!p*'++"
pt %4
8000 -
6000 -
'-
4000
a.
2000
0
10
20
30
40
50
V0
60
0
4
6)
0
12000
FA
I.-
0
Fiue
IL C
100000
8000 6000
4
Anne
0
20
10
40
30
50
70
60
Site #
5: Post-polish thickness profiles for two different devices. Measurements were
:E taken over a grid similar to that in Figure 3.10.
1WWWW
9500
9000
8500
--
8000
7500
-
7000;
63 Site
Average,
3 Site Average
5
10
15
20
5
10
15
20
2-
0
Lot #
Figure 3.16: Multiple device control using a three site average of the thickness.
based on measurements from only a few sites within the die, the resulting control may
appear accurate, but in actuality the true average may be very different than the average of
89
the measured sites.
Second, the range of thicknesses within the die is different for each device. Since the
control strategy using SFEs only measures a few sites on the wafer, there is no way to
monitor the within die variation for the different devices. As a result, this approach cannot
dynamically limit the amount of polishing in order to ensure that the device is not overpolished to the point where the oxide is too thin or the underlying layer is exposed.
Third, the distribution of densities on each device layout determine the removal rate
and the corresponding thickness profiles over time. We saw in Figure 3.7 that these profiles are highly dependent on the effective densities of the particular site or sites being
monitored. Therefore, when the measured values are used to compute the average thickness, the averages are highly dependent on the density profiles of the particular devices
being run and on the particular site or sites being measured. Consider the average thicknesses of two different devices with different density distributions in their layouts shown
in Figure 3.17. The resulting average thickness profiles for the different devices are very
different. In addition, their behavior over time is complex. Specifically, each device average begins as a linearly decreasing function. The rate of the linear decrease depends on the
densities of the points measured on each device. Devices with more low density features
will have a higher decrease in the average thickness, and devices with more high density
features will have a smaller decrease in the average thickness. In addition, the density distribution will also determine the time at which the linear decrease will become nonlinear.
This point occurs when the first feature reaches planarity. At this point, one component of
the average will remain fixed at the blanket rate. After this point, features with higher and
higher densities will also begin reach planarity, and consequently have removal rates fixed
90
1.4
1.35 -
E
C1.3 -
0
Device #1-
20
40
60
80
100
Polish Time (Seconds)
Figure 3.17: The average thickness for two different devices predicted by the MIT
density model.
at the blanket removal rate. Only when all the features are planarized will the average
removal rate also become fixed at the blanket removal rate, causing the average thickness
to return to a linearly decreasing function. The result is that using SFEs to represent these
complex time dependencies of the CMP process is inappropriate. The first approach to
using SFEs, where a simple ratio between the blanket removal rate and the patterned
removal rate is used, does not account for any change in the slope of these lines. The second method, where the polish time is the sum of a fixed "planarization" time and an additional time based on the blanket removal rate that controls the final thickness, is also
incorrect. It assumes that all the densities have reached planarity, which is generally not
true. This is particularly bad to assume if the exact thickness profile is not known or if the
target removal amount is just at the point when the step heights become planarized.
These factors combine to result in the average thickness for the different devices being
either poorly controlled or controlled to an incorrect thickness. While the blanket wafer
91
performance metrics are closely monitored and controlled, the problems in the control of
patterned wafers continue to persist. In the next section, we present a control framework
that addresses these problems for both single and multiple device control.
3.5 The Multiple Device Control Problem for Dielectric
CMP
The previous sections demonstrated that controlling pattern dependencies in dielectric
CMP, particularly in the case of multiple device control, is a difficult problem. The pattern
dependencies are the largest source of variation in dielectric CMP, and current control
techniques do not address this. Each device may have a different average removal rate, and
typical practice generally requires using test wafers and device dependent removal rate
predictions calculated using inaccurate relationships to the blanket rate to determine the
polish time after the tool has been idle or after a device has changed. These control
approaches do not take into account where on the device they are measuring, nor can they
estimate the range of the thickness variation across the wafer or within the die in order to
ensure that the underlying layer is not exposed. As a result, these strategies decrease performance, waste wafers and chemicals, decrease throughput, and increase complexity. We
will formalize the multiple device control problem in this section, and present a solution to
this problem in the next section.
There are many metrics over which we could perform control. For example, we could
control the average removal rate, the average post-polish thickness, the within-die variation, the within-wafer variation, the step height at certain locations, and others. With each
added component, the complexity of the control strategy increases. Recall that the within-
92
wafer variability does not vary significantly over the life of a typical CMP pad. Other
works have shown that the within-die variability is largely dependent on the process conditions [56]. We ran a design of experiments varying table speed and down force. The surface profile of the within-die variation is shown in Figure 3.18. Here we see that increasing
table speed and decreasing down force will reduce the within-die variation, but that no
optimum is achieved within the range of the tool settings, i.e. there is no "sweet" spot.
Therefore, a control strategy aimed at trying to maintain an optimal within-die variation by
adjusting the tool settings would be inappropriate. In addition, we ran several test wafers
during the 600 wafer experiment described in Chapter 2. The within-die variation as a
function of the life of the CMP pad is shown in Figure 3.19. Here we see that there is little
indication that there are long term dynamics of the within-wafer variation. This also suggests that a control technique aimed at controlling the within-die variation would only
increase the wafer-to-wafer variability in the process by responding to noise in the mea-
1600
e
1500
i
14001
-
S1300
B
12001-
:E
1100,
S 1000>-10
0
8
-
Down
Force
6
4
--
80
(psi)
6
0
20
Table
Speed
(RPM)
Figure 3.18: Within die variation (standard deviation of the post-polish thickness)
shown for a design of experiments that varied the table speed and down force over
a wide range for the dielectric CMP process.
93
I
%JIj
1700-
c
0
1600-
Ir"
cc
15001
.S
elloo
1400
1300
UU0
100
200
300
400
500
600
Wafer #
Figure 3.19: Within-die variation shown over the polishing of 600 wafers. The
stars are the within-die variation measured on four dies on each wafer, and the
solid line is the average of the four die from eight wafers over the 600 wafer run.
surement of the metric. These issues suggest that an initial control strategy should focus
on monitoring these sources of variation, rather than controlling them.
Many approaches to dielectric CMP control also monitor the step height reduction.
However, now that we understand the effect of pattern dependencies in CMP, we are in a
position to better understand step height removal. As shown in Figure 3.20, the step height
Figure 3.20: Typical structures used for step height measurement.
94
was measured across bond pad structures for the test device that was used for control in
Chapter 2, and whose surface profile is shown in Figure 3.2. The step height measured at
this location is shown in Figure 3.21. The wafers were polished at polish times such that
there was equivalent amount of material removed between the three processes which had
different polishing rates. Therefore, the values are plotted against the equivalent amount of
dielectric material removed from the polishing of the blanket wafers at similar polishing
times. This allows a fair comparison of the step heights removed for the different processes. Here we see that, in all three processes the step height of this feature quickly
decreases. Note that this measurement was taken on the edge of an array of features, and
that there is a large open area around this measurement location. This implies that the
effective density of this location is low, because the large down area decreases the density
where the step height measurement was taken. Now consider a step height measurement
taken on the same wafers in a location of higher density, as shown in Figure 3.22. The step
10
4
*103
IM4
C
CProcess
AM
Process B
10 -A
Process A
1011
0
1000
2000
3000
4000
5000
6000
Amount Removed (Angs.)
Figure 3.21: Step height measurement for a low density feature, for three different
processes, plotted against the amount removed on a blanket wafer.
95
Figure 3.22: Higher density structures used for step height measurements.
Process A
Process B
C
10
Process C
102
0
1000
4000
3000
2000
Amount Removed (Angs.)
5000
6000
Figure 3.23: Step height measurement for a low density feature, for three different
processes, plotted against the amount removed on a blanket wafer.
heights are plotted, as in Figure 3.21, against the amount of blanket removal in Figure
3.23. In this case, the results are completely opposite. In the previous case, Process A had
the fastest step removal, and Process C the slowest. Here we see that Process C has the
fastest step height removal, and Process A the slowest. The reason is that step height is
dependent on where you measure, i.e. it is dependent on the device layout pattern. These
step height profiles can be explained using the density model. The step height is essen-
96
tially determined by the removal of the raised features over which the step height is measured. Because the results in Figure 3.21 are from a region of low density, the step heights
are removed very quickly. Once the features are nearly planar, the removal rate of the
raised areas change to the blanket removal rate and the removal rate of the down areas
increase to the blanket rate, drastically slowing the removal of the step height. The difference in the three processes comes from the "effective density" of each process. The process with the larger planarization length has more of the surrounding low density open
area averaged in, causing it to have a lower effective density and therefore polish faster. On
the other hand, the higher density step height measurements have the opposite effect. A
larger planarization length, Process C, averages in more of the lower density features far
away, and thus removes the step height faster.
In reality, if one is interested in planarization, i.e. removal of the step height, across the
entire device, the difference between the step height in the low and high density features
should be monitored, as shown in Figure 3.24. Here we see that, similar to the within-die
variation profile, e.g. see Figure 3.4, there is a respective rise in the difference of the step
heights of the high and low density features followed by a decrease in the step height. We
can see a direct comparison of the difference in step height shown in Figure 3.24 and the
within-die variation for these same wafers shown in Figure 3.25. We see that the step
height is highly correlated to the within-die variation. We also showed that the within-die
variation does not have an optimal location within the range of the tool settings and does
not vary significantly over the life of the CMP pad, and should not be controlled. Therefore, in terms of controlling the CMP process, step height measurements would serve
mainly as a redundant measurement of within-die variation. In addition, these measure-
97
1
Process A
0.9
Process B
0.8
-0.7
Wa0.6
Process C
-S0.5
00.4
01
0.2
0.1
1000
3000
4000
2000
Amount Removed (Angs.)
5000
6000
Figure 3.24: The difference in the step height measured at low and high density
features, for three different processes, versus the blanket amount removed.
1100
-:1000
0
CD
C
900 F
C
02
A
Process A
800
-
S700
-
Pro4s
Process B
PrcsC
.C
Pe
600
0
1000
4000
3000
2000
Blanket Amount Removed (Angs.)
5000
6000
Figure 3.25: The within-die variation, measured at 25 locations in 10 die, for three
different processes, versus the blanket amount removed.
ments are generally time-consuming and require significant manual effort to level the
traces. Therefore, controlling on step height measurements would only serve to slow down
the control process, whereas performing optical thickness measurements using on-line
98
metrology actually provides a speed up in processing (as was shown in Chapter 2).
We could also focus our efforts on controlling the average polishing rate of the process, however doing so generally requires us to change the process settings, and as we saw
in Figure 3.17, this will have an adverse effect on the within-die variation. And since
within-die variation is the largest source of variability in the CMP process, we believe this
would not be the best approach. Recall that our goal in the CMP process is to planarize the
surface of the wafer, meaning remove all the steps in the thickness, as well as have the
overall surface of the wafer as flat as possible, without excessively thinning or breaking
through the dielectric material. Since we cannot improve on the within-die variation with
the process settings, our focus here will be on controlling the average thickness, and monitoring the total non-uniformity, i.e. the combined within-wafer and within-die variation.
In essence, we are stuck with a surface such as that outlined in Figure 3.1, and we are trying to control the level of this surface while trying to monitor the total indicated range of
the surface. This information can be used to ensure that the low points on the wafer are
above the required minimum thickness or to estimate such things as the delay in the interconnect circuitry.
Let us now formalize a goal for the control of the CMP process. Let us assume that we
would like to control the average of the thickness at several sites in several dies over the
surface of the wafer. Note that we may desire that a large number of such points be
included in the average, and are not necessarily limited to the points that will be measured.
Let us assume that the thicknesses of these points are given by
y
where
1 is
s, d
[n],
(3.9)
the layout number of the device being run, s is the number of the site within die
99
number d, and
n
is the wafer number. We would like to control the average of all sites
within all dies within each device layout, 1,
Nd(l)N,(l)
y [n]
N
NP()
NsI )
Nl)
s
-s1d[n]
s,-
d= Is=
(3.10)
1
to the target average thickness, T1 , for that layout over all runs with wafers of type 1. The
mean square error from the target is thus
1S
y -
n c=l
2
(3.11)
n
We would like to obtain the minimum of these errors
(
Ni
MSEopt = min
(3.12)
MSE
over all devices. In addition, we would like to monitor the total non-uniformity of the process
Nd(l)Ns(l)
NU [n] = (Nd(l) - 1)
(Ns(l) - 1)
s,I d[n] - y [n]).
(3.13)
d=ds=1
We will stop the process for re-optimization if the non-uniformity is too large, or
NU [n] > ULNU
(3.14)
for any device layout 1.
In order to achieve this goal, we are required to do this with very few measurement
sites within only a few die of each wafer. Also, the number of measurement sites for each
type of layout may be different. Thus, the controller has at its disposal
100
y
Im[n],
sm, dM
(3.15)
where sm is the site number in die number dm of the measurement from the device layout
I on run n.
In summary, we would like to control the average thickness of the profile shown in
Figure 3.1, while monitoring the total range of the profile. In addition, we would like to do
this for multiple devices running sequentially on a single tool, using only a small number
of measurement sites within a small number of dies from each device. Note that the form
of the problem given above does not make any assumptions of the form of the measurements. In particular, the thicknesses of one device may be correlated to the thicknesses of
another device, but the form of the problem neither states this, nor suggests a form for any
such relationship. This will be entirely dependent on the solution to the control problem.
The following section provides a framework which addresses this problem, and provides a
strategy for controlling the CMP process which addresses many of the problems outlined
in this and previous chapters.
3.6 A Framework for the Control of Multiple Devices in
Dielectric CMP
A device independent CMP control framework is outlined in Figure 3.26. The framework is centered around the density model described in Section 3.2. In summary, the
approach is to generate a virtual post-polish thickness profile of the entire wafer surface
using the density model, i.e. using estimates of the blanket removal rate and the planarization length of the CMP process, for various polish times. The polish time that provides a
101
Generate
Time
IE
Polish & Measure
7-IWafer
th- Density
Model
U:pdate
Device Files
Figure 3.26: A device independent run by run process controller for CMP.
virtual wafer whose average is as close to the desired average thickness for the particular
device layout being run is used to polish the actual, i.e. real, wafer. Once the actual wafer
is polished, a small number of points are measured in a small number of die on the wafer.
The parameters of the density model, i.e. the blanket rate and planarization length, are
then varied until the thickness predictions of the density model best fit the few measured
thicknesses. These new values for the blanket rate and planarization length are then fed
back into the model, i.e. they are averaged with past values, and the new parameters are
used to generate the polish time for the next wafer to be polished. The details of the actual
implementation of this control method and variants of it will be discussed later.
This strategy provides several benefits over traditional CMP process control methods.
These benefits are the result of the separation made by the control framework as to what is
controlled and what is measured. In this strategy, the measurement locations are not
assumed to be what is controlled. The model of the CMP process is used to de-couple the
102
interaction between the effects of the pattern layout and the effects of the polishing process. The parameters of the model are used in conjunction with the layout file to predict a
much more detailed outlook of the actual wafer, without actually having to measure the
entire wafer. The average of this "virtual" wafer is what is actually controlled, not a few
measurement points that may have little to do with the true average of the wafer. Thus, the
average film thickness of the entire wafer, including the within-wafer and within-die variation, is controlled with only a few measurements. Also, because the virtual wafer is created from the model parameters, we can estimate the within-die and global non-uniformity
from the virtual wafer.
Another benefit of this approach is that the device layouts are used to remove the layout dependencies in the measured data to extract the properties of the process, independent
of the device being polished. These properties of the process are tracked using the parameters of the density model, i.e. the blanket removal rate and the planarization length.
Because the same parameters can be used for any device, devices may be interchanged at
any time without running a blanket wafer or a test wafer of that device. In addition, it
allows a device to be accurately controlled, even if it has not been recently run.
3.6.1 A Device Independent Control Algorithm
We now describe the control algorithm outlined above in a more formal manner,
explicitly describing the steps in the control algorithm. In order to predict the virtual profile of the post-polish thickness of the entire wafer, a large grid of discrete set of sites on
several dies must be set up, and the density model used to predict the post-polish thicknesses at these points. In order to do this, the polish time and estimates of the blanket rate
103
and planarization length for each of the dies to be predicted must be input into the density
model. Specifically, we have estimates of the thicknesses given by
Kd[n] - t[n]
y
s, d
zo[n] -
[n]
t< t
pex(s), y(s), Ld[n], D( 1 ))
zO[ n] - z I [n] - Kd[n] (t[n] - t )
(3.16)
t > tl
where
tI
n] pf
z
=
y(s), Ld[n], D(1))
Kx(s),
K(d)
(3.17)
Kd[n] and Ld[n] are the blanket rate and planarization length for die number d on run n,
t[n]
s,
is the polish time used on run
n, x(s)
and y(s) are the x and y locations of site number
zo[n] and z1 [n] are the initial oxide thickness and initial step height for the wafer on run
n. The effective density is calculated as
Pefx(s), y(s), Ld[n], D(l)) =
po(x, y, D(1)) - w(r, Ld[n])
(3.18)
xe X
ye Y
where
r=
(x -x(s)) 2+(y -y(s))2
(3.19)
and where x and Y are the set of all points within a finite pre-specified distance from the
point of interest, {x(s), y(s) }, in the device layout D(1) for layout number 1, p0 (x, y, D(l)) is
the feature density of each of these points, and
described by equation (3.6).
The polish time for run
n
can be found as
104
w(r, Ld[n])
is the elliptic weighting function
Nd(l)Ns(l)
t[n] = argmin Nd(l) Ns(l)
(3.20)
s, d [n]d= Is= 1
where, as stated in the definition of our control problem in the previous section, T is the
target average thickness for the device layout number 1.
Once the optimal polish time for the incoming device is determined, then we can predict the average post-polish thickness profile of the given wafer as
Nd(l)Ns(l)
y [n]
1
=
s
s
d = Is=
(3.21)
[n]
1
where the predicted thicknesses are given by (3.16) using a polish time given by (3.20). In
addition, we can predict the global non-uniformity (variance) of the entire wafer using
Nd(l)N,(l)
NU [n]
2
[n] -y[n].
d = Is =
(3.22)
1
After polishing, measurements of the post-polish thicknesses are taken at a few locations on a few dies across the wafer; let these measurements be given by
y
Sm, dm
[n],
(3.23)
where I is the layout number of the device being run, sm is the number of the measured
site within die number dm, and n is the wafer number. Note that this set of points may be
a subset or a completely different set of locations than the large set of points specified by
s
and d in the virtual post-polish thickness wafer profile outlined above. These measurements are then used to extract a new blanket rate and planarization length for each mea-
sured die by finding the parameters that minimize the squared-error fit between the
105
measured data points and the model estimates for those points:
r
(Kd[n],dn]
{ dI~n}
Ndm( 1) Ns,(1)
==Lrmi
argminNdm(l) Nsm(1)
Y sm, dm
dm = Ism =
[n]-y
sm, dm
[n] )2
(3.24)
1
where the estimated thicknesses
1
sm, dm
(3.25)
[n]
are computed in the same fashion as the points in the profile of the virtual post-polish
thickness profile outlined in (3.16), but with the set of measured sites
sm
in the measured
die dm. These new values for Kd[n] and Ld[n] are used to update the estimates of the blanket rate and planarization length for each die. This is done by performing an exponentially
weighted moving average of the blanket rate parameter for each die
Kd[n+l]
=
Kd[n]
(3.26)
d[n] +(I -wL) - Ld[n],
(3.27)
wK Kd[n] +(1-wK)-
and the planarization length parameter for each die
Ld[n+ 1] = wL
where wK and
wL
are the EMWA weights for the moving average of the blanket rate and
planarization length of each die, respectively. In theory, these could actually be different
weights for each die. However, we will assume that the dynamics of the blanket rate and
planarization length are the same for all dies on the wafer. This may turn out to be insufficient if some die appear to change more than others. For example, the die nearest the flat
or notch of the wafer may have such a property.
On the next run, the controller uses the device layout of whatever type of device is
being run with these updated estimates of the blanket rate and planarization length for
106
each die to create new virtual post-polish thickness wafer profiles in order to optimize the
polish time for that device, and the cycle repeats.
When new values of these model parameters are extracted from the measured values in
equation (3.24), only a few measurements are necessary. Because the model explicitly
relates each measurement to its effective density determined using the model and the
device layout, each measurement is compared with a prediction from the model, and not to
some arbitrary target that does not relate to polishing of that particular site. In varying the
parameters of the model, the predicted profile is aligned with the measurement sites,
allowing the model parameters to be updated independent of the particular device being
run.
We can see from equations (3.16) through (3.19) that the value of the thickness at any
point is determined by a few key parameters: the blanket removal rate of the die, the planarization length of the die, the device layout, and the polish time. Given the blanket
removal rate and planarization length, we can use the model to generate a very densely
sampled profile of the device being run. This densely sampled profile allows us to generate
a polish time for the next wafer to be polished, i.e. equation (3.20), that controls the true
average thickness, i.e. equation (3.21), to as close to the target value as possible. In addition, this thickness profile prediction and the controlled average thickness are independent
of the locations that we actually measure, i.e. equation (3.23). The densely sampled predicted thickness profile also allows us to estimate the total, or global, non-uniformity, i.e.
equation (3.22).
After polishing, measurements of the post-polish thicknesses are taken at only a few
sites on a few dies across the wafer, i.e equation (3.23). The model is then used to extract a
107
new blanket rate and planarization length for each of the measured die, i.e. equation
(3.24). This is done by varying the parameters of the model, the blanket rate and planarization length, in order to best fit the measured data. These new values are used to update the
estimates of the blanket rate and planarization length for each die via equations (3.26) and
(3.27). The use of the pattern layout in the model de-couples the effects of the device layout from the effects of the process, i.e. the blanket rate and planarization length. It is this
independence of the device layouts that allows us carry over the state of the process from
wafer-to-wafer, regardless of the type of device being polished. Specifically, by simply
changing the device layout and using these parameters in the model, we can predict the
post-polish thickness profile of any new device and interchange devices at any time.
3.6.2 Further Discussion of the Device Independent Control Algorithm
While the algorithm in the previous section is fairly general, in that it defines methods
for tracking both the blanket rate and planarization length for all die being measured, we
would like to address whether or not this is really necessary. In particular, it may be the
case that we wish to only track these parameters for one die, or an average over the measured die in order to keep the strategy and updating more simple. It may also be the case
that we wish to track only the blanket rate and use a constant planarization length, because
the re-calculation of the planarization length in the extraction step is time consuming.
Recall that the blanket wafer removal rate varies over the surface of the wafer (e.g. see
Figure 1.8). Thus, we will definitely need to estimate the blanket rate for each die that we
wish to monitor. In actuality, monitoring the actual position rather than simply the radial
distance may be somewhat unnecessary, since these radial non-uniformities are often due
108
to imperfections in the wafer carrier and the wafer has a tendency to slip underneath the
carrier during polishing. However, the blanket rate near the wafer flat or notch may be different than elsewhere on the wafer due to the asymmetry. Therefore, we monitor a separate
blanket rate for each die in this work.
In order to determine if we need to monitor and update the planarization length for
each die, or even monitor and update this parameter at all, several characterization test
masks were run during the 600 wafer control experiment outlined in Chapter 2. From
these characterization masks, the planarization length was extracted for four die on eight
wafers spread across the 600 wafer experiment. The results are shown in Figure 3.27. Here
we can see that the average of these values is fairly constant over the life of the pad. However, notice that die one appears to be continually increasing, while dies three and four
appear to be continually decreasing. This suggests that, over the long-haul, the planarization length of each die will also need to be updated. Therefore, the first implementation of
the controller updates both the blanket rate and planarization length. We now turn to pre-
000
E
E
5500-
50000
4500
-
0
CC
4000-
3500C.
300
20
40
60
80
100
Lot #
Figure 3.27: The average planarization length over the course of 100 six wafer lots
(solid line), and calculated planarization lengths for each of the four die on each of
these eight wafers (dots).
109
senting our experimental implementations of this type of controller.
3.7 Experimental Results
This control framework was tested on an IPEC 472 polisher at Texas Instruments, Inc.
The two devices shown in Figure 3.28 were polished alternately every other run, with the
objective being to control the true average thickness of these devices to the same target
value of 8000A. For these particular devices, this is a challenging task for two reasons.
One is that both devices have large regions of very high density and large regions of very
low density, which results in a large range of thicknesses that are difficult for any model to
accurately predict. The second is that they have different average values, making the average removal rates highly device dependent. The devices were polished alternately so as to
provide the most difficult situation for control. The alternating of the devices causes the
largest change in the polishing rate on each run. If there are any device dependencies in the
modeling or control strategy, they will appear as systematic errors that alternate every
other run with the device.
Device 1
Device 2
Figure 3.28: Test devices being controlled with the device independent controller.
110
3.7.1 Updating Both Planarization Length and Blanket Removal Rate
In our first control experiment, all measurements were performed on an ex-situ KLA/
Tencor UV1280. We would like to test our claim that a reduced number of measurement
points could be used to control the true thickness. We used a reduced number of sites, 12
sites on four dies, to update the model during the control experiment. In order to determine
the polish time and estimate the global non-uniformity, we estimated the wafer post-polish
thickness using a more densely sampled plan, 63 sites on four dies. After the control
experiment was completed, we remeasured the wafers using this more densely sampled
plan, and the true average and the global non-uniformity of the wafers were calculated
using these measurements. These were used to determine how well the control of the average thickness was, since the points that are measured during the control run may be very
different than the true average that is being controlled. These measurements were also
used to determine how well the controller's prediction of the global non-uniformity during
the control experiment actually was. The sampling plans within each die for the two
devices are shown in Figures 3.29 and 3.30. The points indicated as filled circles were the
points used to update the model during the control experiment, while the points indicated
by crosses were additional points that were measured following the control experiment in
order to calculate the true mean and global non-uniformity. The sampled dies are shown in
Figure 3.31.
111
X (mm)
Figure 3.29: Measurement plan of Device 1. Circles are points used for control.
Crosses and circles are used to determine the true average.
X (mm)
Figure 3.30: Measurement plan of Device 2. Circles are points used for control.
Crosses and circles are used to determine the true average.
112
[II
11111_11111
Figure 3.31: Map of the dies used for the multiple device control.
In this first experiment, the EWMA estimates of the planarization length and blanket
removal rate were monitored for each of the four die. This resulted in the average thickness being controlled to within 400A of the target, as shown in Figure 3.32. This error is
fairly good for multiple device control, keeping in mind that the average lot to lot variability of blanket wafer polishing is on the order of 200A. As indicated earlier, this represents
a particularly difficult case. The total non-uniformity of each of the devices being run is
shown in Figure 3.33, including the estimated total non-uniformity predicted during the
control experiment using the few measurement points and the actual values determined
from 63 point measurements taken on the four die following the control experiment. Here
we can see that the predictions during the experiment using the few measured points provide an excellent monitor of the post-polish thickness.
Although the average thickness was controlled to an acceptable level, there were several problems with the control run. First, since the measurements for control were performed off-line, each control run took one and a half hours (30 minutes to polish the
wafers, 30 minutes to clean them, and 30 minutes to measure them). Therefore, only a
113
Wuuu
8500U,
U,
a)C
- - - - - -
8000-
U
I- 7500-
7000
0
1
a)U
2-
G)
0
0
2
3
4
5
6
7
8
2
3
4
5
6
7
8
rr
3-
1
Run #
Figure 3.32: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number run.
12000
11000
Maxi mum
10000
E
0-
0
8 000
L
Range
7000-
K
6000-
z
Minimum
50004.
4000-
4'
%
%.
.4'
4,
I'
30000
1
2
3
4
5
~g.
6
7
8
Run #
Figure 3.33: The minimum, maximum, and range of the polished devices. The
dashed lines represent the predicted values using the model, while the solid lines
indicate the values determined from the 63 point measurements on four dies.
small number of runs were performed. Second, the polishing pad was new and still in the
114
break-in period. As a result the blanket rate was rapidly decreasing toward a steady state.
The extracted blanket rates shown in Figure 3.34 effectively demonstrates this. This made
controlling the process difficult, since the EWMA weights were not optimized for these
conditions. Third, the output appears to oscillate with the device. The dependency is also
seen in Figure 3.34, where we see oscillations in the extracted planarization lengths (from
the update step). This is not good, as the model assumes these are device independent
[53,56]. This suggests that either the model is inaccurate, or planarization length is not
device independent. At the time these experiments were performed, we were unable to
analyze the model or these effects in detail. This analysis was done later, and is presented
in Chapter 4. At the time, we decided to assume that the planarization length was indeed a
function of the device, and continue our experiments based on this assumption.
3200-
E
MU
3000-
0
A
2800-4
)
C
b
4
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
5000
-T
N
4000
E
3000
3
2
1
0
0
Run #
Figure 3.34: Parameters extracted from the measured data during the first control
run.
115
3.7.2 Updating Blanket Removal Rate Only
The logical approach to overcoming the device dependencies of the planarization
length, based on our new assumption, is to fix the planarization lengths as a function of the
device. It is possible that there are some time dependencies of the planarization length, as
suggested by Figure 3.27. Since the change in the planarization length, if any, is very slow
over a very long period of operation, these changes will not come into play in our relatively short experiment. The focus here is to demonstrate that improved control can be
obtained by fixing the planarization length and updating only the blanket rates of each die
(which are known to change with time). In order to do this, we generated the optimal planarization lengths for each of the devices using all the points from the previous experiment. A new version of the controller which uses these fixed planarization lengths for each
device, updates only the blanket rates based on the measured data. In addition to this modification of the controller, we switched from using an ex-situ metrology tool to an on-line
metrology tool. This provided a significant speed-up in terms of the control, and the
throughput went from one wafer every one and half hours to roughly one wafer every 15
minutes.
Figure 3.35 shows that this resulted in control of the 63 point average to within 200A
of the target, significantly better than control typically achieved when running multiple
devices. In addition to the excellent control, we were able to provide accurate estimation
of the total non-uniformity during the control run. Figure 3.36 shows these predictions
plotted against the actual value determined from the 63 measurements made on each of
four dies following the experiment. Again we see that the predictions are highly accurate,
and that the total non-uniformity of each of the two devices is accurately predicted.
116
tuuu
8500
(0
(0
0
8000-
C.)
I-
7500-
7000
2
3-1
4*:
0
U
2-
J4
4
6
8
10
12
14
16
18
16
18
414
0
0
2
4
6
10
8
12
14
Run #
Figure 3.35: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run.
12000-
Maximum
9000
E
L.
8000
Range
0
0
z
-
5000"
Minimum
p
4000
3000
20002
2
4
6
10
8
12
14
16
18
Run #
Figure 3.36: The minimum, maximum, and range of the polished devices. The
dashed lines represent the predicted values using the model, while the solid lines
indicate the values determined from the 63 point measurements on four dies.
Unfortunately, as before, there still appears to be a strong dependence in the controlled
117
average thickness on the alternating devices. This is further seen by the extracted blanket
rates shown in Figure 3.37. These oscillating extracted rates indicate that the "blanket"
rate actually fluctuates with the device, the values used for the fixed planarization lengths
during the experiment are slightly incorrect, or the model is just plain inaccurate and needs
to be corrected. Recall that in our model, the blanket rate is a parameter that is based on
the removal rate of the blanket sheet film of material. We could assume that this is actually
a function of the device, however, this violates the assumptions of the model. On the other
hand, it is possible that the planarization lengths are in fact slightly incorrect, since we
only had a few wafers from which to determine these values. The values for the planarization length were 3500gm for Device 1 and 4500gm for Device 2. After this second experiment, the planarization lengths were extracted for all four dies of the 15 wafers run. The
re-optimized planarization lengths were 3450gm and 4615gm for Devices 1 and 2, respectively. These represent only small percent differences in the planarization length, of which
3200-
E3000-
a
'
2800
2
4
0
'
EL
6
8
10
12
14
16
18
5000
)
3000
.
.
0
2
4
6
8
10
12
14
16
18
2
4
6
8
10
12
14
16
18
Run #
Figure 3.37: Measurement plan of Device A. Circles are points used for control
and crosses and circles are used to determine the true average.
118
the predicted profiles are known to be relatively insensitive. Therefore, this is most likely
not the cause of the errors. Thus, these errors are most likely due to inaccuracies in the
assumptions of the model. As a result, work is needed to improve the model in order for
this control technique to be most effective. Chapter 4 will explore one possible improvement in the CMP model that may help correct these problems.
3.7.3 Correcting for Device Dependencies in the Blanket Rate
Although further improvements in the model are necessary in order to make this control solution truly device independent, we still wished to determine if this type of controller could be made to work even better in spite of the inaccuracies in the model. A second
variation assumes that the error can be further corrected using an adjustment to the blanket
rate, along with the fixed planarization lengths, for each device
Kd[n] = KJI[]+Ad[n]
(3.28)
where
Ad[n+l]
=
wK -(kd[n] - K1[]) + (1 -WK) - Ad[n] .
(3.29)
A device dependent expression for the blanket rate, Kd[n], is considered which is the sum
of a blanket rate due to the device, K 1[0], and an error term, Ad[n]. The error term is
updated using the exponentially weighted moving average in equation (3.29). Equations
(3.28) and (3.29) replace equation (3.26) in the original form of the controller. The adjustments due to the device dependencies were determined from the experimental data from
the previous control run. As shown in Fig. 3.38, this results in control of the average postpolish thickness for multiple devices to within 100 A, and removes nearly all device
dependencies in the controlled thicknesses. This level of control is far better than typically
119
8500oo0
U)
8000C
Jk
7500 -
7000L
0)
4)
a
5
10
15
20
25
-
^^n
23n
10
5
15
20
25
Run #
Figure 3.38: Controlled average thickness of 63 sites on four dies measured following the experiment and the device number being run.
1100010000-
Maximum
9000I-
0
8000-
Range
7000-
0
z
6000-
Minimum
5000-
02
400030002000
-
0
5
10
15
20
25
Run #
Figure 3.39: The minimum, maximum, and range of the polished devices. The
dashed lines represent the predicted values using the model, while the solid lines
indicate the values determined from the 63 point measurements on four dies.
achieved using conventional control techniques for switching multiple devices, and is at
the quality level necessary for next generation devices. In addition to this, the estimated
120
and actual total non-uniformity are outlined in Figure 3.39. Once again, we see excellent
prediction of the total non-uniformity across the wafer. This demonstrates that many of the
test wafers and pilot runs used to control and monitor the CMP process could be removed,
because the total non-uniformity can be extracted from the few points measured on each
run.
One final point we would like to stress concerns the difference between the measured
values and the complete profile of the device being controlled. Because we controlled the
true average of a densely sampled profile of the wafer while measuring only a few select
points, we would like to demonstrate that this average is quite different than the average of
the measured points. Figure 3.40 shows the resulting control of the 252 point averages (63
points on four dies) along with the 24 point (6 sites measured on four dies) average measured for Device 1, and the 48 point (12 sites on four dies) average measured for Device 2.
Here we see that these average values are very different than the controlled 252 point averages.
11000
10500-
Measured Device A Average
__10000-
.
9500- True
Device A
9000 - Average
o
8500-
True
Device B
Average
80007500
7000
-
Measured Device B Average
5
'
15
10
2002
25
Run #
Figure 3.40: Difference in the average of the measured sites and the average of the
controlled sites (the "true" average).
121
3.8 Summary
We have presented a device independent controller that simplifies processing and significantly improves lot to lot processing quality with multiple devices. We demonstrated
that effective control in this scenario can be obtained using the MIT density model that
correlates the value being measured with the true profile of the controlled thickness. This
provides several benefits. First, it allows measurements from any device to be used to
update the tool level model that can be used with any other device being processed. Second, it allows us to measure only a few points, while very accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of
the polishing process for each device being processed. This eliminates the need for many
test wafers aimed at determining uniformity.
Our first experiment demonstrated that this type of control results in a lot to lot variability (400A) which is fairly good relative to existing techniques for controlling multiple
devices. Unexpectedly, the planarization length of the density model was found to be a
function of the device. This is unfortunate because we no longer have a truly device independent model. This indicated that an improved model is necessary to achieve truly device
independence control. In our second experiment, we fixed the planarization lengths for the
different devices so that we could improve the control with the existing model. This
resulted in significantly better control (200A). The best results (100A) were obtained with
a device dependent model update strategy, which included adjustments to the blanket
removal rates. In all cases, we were able to very accurately predict the total non-uniformity of the polished wafers. In the next chapter, we will explore one possible improvement
to the model, in the hope that its use will remove these dependencies.
122
123
124
Chapter 4
A Dielectric CMP Model Combining
Density and Step Height Dependencies
In this chapter, we turn to trying to understand the device dependencies in the model
used for control. While the model used in Chapter 3 provided a first order prediction of the
post-polish dielectric film thickness, it was suggested that this model is insufficient for
providing truly device independent and highly accurate control of the average thickness.
This tight control of the average post-polish film thickness will become critical as specifications on device properties continue to rapidly tighten as device sizes continue to shrink.
In order to achieve this tight control, a controller such as that outlined in Chapter 3 will
play an important role. Therefore, it is important that we develop a model which can more
accurately predict the entire profile of a device, independent of the device being run. In
this chapter, we will explore the use of a model similar to that used in Chapter 3, but with
the addition of a step height dependence.
Several works have proposed models for the chemical-mechanical polishing of interlevel dielectrics; each of which provide various benefits. We will not review all the models
in this thesis; other works have provided a complete review [54,56]. Our approach is to
consider those models which could be combined with the MIT density model, which has
the ability to efficiently predict the post-polish thickness profile for the entire die of an
arbitrary layout [53]. This is critically important if we are to utilize a model for performing
the device independent process control outlined in Chapter 3, for determining dummy fill
125
[56], or for estimating circuit performance [57]. However, the density model provides
thickness predictions to only a first order, and falls short when predicting low density features. Burke proposed in [58] that the step height decreases exponentially with time. Tseng
et al. proposed that removal rates of raised and down areas converge exponentially to the
removal rate of an unpatterned dielectric sheet film (blanket removal rate) as polish time
increases [46]. However, both these models lack a clear connection to density. In addition,
the model in [46] assumes the pad is always in contact with both the raised and down
areas, and suggests that the removal rate step height dependence is determined by the distribution of pressure between the raised and down areas. Grillaert et al., from IMEC, provided experimental data in [47] which demonstrates that these claims are true only after a
certain step height is reached. The IMEC model suggests that before this "transition" step
height is reached, the removal rate of the raised areas is characterized by the blanket rate
divided by the density [47]. After the transition step height is reached, the removal rate
profile is the exponential model outlined in [46]. The IMEC model also suggests that the
transition step height is dependent on feature density, but it is unclear how these transition
step heights can be determined a priorifor arbitrary layouts. Therefore, it is not clear that
this technique would work well on typical patterned wafers, where the layout of the features is complex and their densities are not easily calculated.
In this chapter, we will expand the MIT density model to include the step height
dependencies of the removal rate suggested by the IMEC model. The purpose is to, like
the density model, provide predictions for arbitrary layouts, but improve the fitting of low
density features. We will focus on comparisons of variations of this model with the density
model, and the effects these comparisons have on our understanding of the mechanisms in
126
dielectric CMP polishing.
Section 4.1 briefly reviews the density model and step height dependent models. An
analysis of the density model fit to experimental data is given in Section 4.2. Section 4.3
outlines a combined density and step height dependent model, and makes comparisons of
this model to the density model. Section 4.4 presents two variations of this time-density
model which simplify the model form and solution. We return to our original question in
Section 4.5, and discuss whether the device dependencies are removed by the combined
density and step height dependent model.
4.1 Density and Step Height Dependent Models
As we saw in Chapter 3, the MIT density model provides a first-order approximation
of the post-polish dielectric thicknesses for arbitrary layouts [53]. As shown in Figure 4.1,
this model assumes the polishing rate of a raised area is initially equal to the blanket rate
divided by an effective density. The effective density is determined by computing a
weighted average of the feature densities within the planarization length window. During
this first regime, the model assumes there is no removal in the down areas. Once the step
height is assumed to be completely removed, the model assumes that the removal rates of
both the raised and down areas equal the blanket rate. This model's key strength is its ability to efficiently predict the thickness of an arbitrary layout to a first order. This benefit
comes from the weighted average of the densities within a window, or planarization
length. This averaging is necessary because the pressure distribution of force on a particular feature is affected by neighboring features.
127
Patterned
Blanket
Removal Rate Profile
Removal Rate
A
eff(X Y, L)
Effective Density Profile
Layout File
Figure 4.1: A high-level view of the MIT density model.
The model proposed by IMEC shows that the removal rate of the raised areas, and thus
the step height reduction, is not linear [47]. As shown in Figure 4.2, they suggest there is
an initial linear regime, where the raised area removal rate is equal to the blanket rate
divided by the feature density. The removal rate of the down area during this period is
zero. After the pad contacts the down area, this first regime is followed by a period where
the removal rate of the raised area exponentially decreases to the blanket rate. During this
second regime, the removal rate of the down area exponentially increases from zero to the
blanket rate. Typical plots, as well as the expressions for the removal rates in the raised
and down areas as a function of polish time, are shown in Figure 4.2. Here K is the blanket
128
C
E
3X10
(R)/p
BR +(I -p)-e
S
20
40
60
80
0
1C
IX.
0
0
0
400 0-
200 0-
0
-
BR-p-e
20
40
60
80
1i0
Polish Time (Seconds)
Figure 4.2: The removal rates of the raised and down areas using the IMEC step
height dependent model.
removal rate, ho is the initial step height, p is the feature density, r is the exponential time
constant, t, is the polish time, t, is the time of contact with the down area, and hi =
-K
is the transition or contact step height. The work in [47] proposes that the step height at
which the pad contacts the down areas is a function of the feature density; i.e. the higher
the density the smaller the contact step height. We will explore this in more detail later on.
Before we continue, we would like to consider the differences between these models,
highlighted in Figure 4.3. Here the density model predictions are placed over the IMEC
model predictions. We see that the time at which the density model switches to the blanket
removal rate is later than the time at which the IMEC model transitions to the exponential
removal rate. The IMEC model suggests that removal of the down area begins before the
time suggested by the density model. The pressure distribution is assumed to change
129
c
IMEC
1-
0
0
20
40
60
80
100
60
80
100
.E 6000
E
4000 --..
.
C13
IMEC
2000 -
0
0
20
20
MIT
40
Polish Time (Seconds)
Figure 4.3: Removal rates of the density and step height dependent models for
both the MIT density model and the IMEC model.
before the step height is completely removed, and the load of the force is shared with the
down area. This creates a large difference in the removal rate predictions just after the
exponential regime begins. We have plotted the percent differences in the amount removed
determined from each model in Figure 4.4. Here we see that the predictions are fairly similar at the beginning and end, but there are large differences in the middle. In particular,
note that the positive percent difference in the raised area indicates that the MIT model
predicts more removal on the raised area, while the negative percent difference in the
down areas indicates that the MIT model predicts less removal in the down areas.
130
0
0
20
40
60
80
100
40
60
80
100
300
200 -S
100--
0 0
-
20
Polish Time (Seconds)
Figure 4.4: Percent difference in removal predictions between the density model
and the IMEC model.
4.2 Analysis of the MIT Density Model
In Chapter 3, we demonstrated that the MIT density model had device dependencies in
the model. In addition, it was shown that using adjustments to the blanket rate was one
way to remove these dependencies in the controlled thickness. We now consider some
experimental data in order to show that the MIT density model needs to incorporate step
height dependent removal rates. Later we will determine if such a revised model can
remove these device dependencies for use in process control. Wafers were patterned with
an MIT CMP test mask, deposited with a 16800A oxide layer, and polished using an
IC1000/SUBAIV pad stack with a standard process on a rotary polish tool at Texas Instru-
ments, Inc. The wafers had 20mm by 20mm dies, patterned out to the edge. As shown in
Figure 4.5, each die contained five rows and five columns of 4mm blocks with lines of
131
Gradual Density Region |jl3@
X (mm)
Figure 4.5: Description of the pattern features in the test mask used for model
comparisons.
varying pitch and density. The post-polish dielectric thickness data, as well as the density
model predictions are shown in Figures 4.6 and 4.7 for the raised and down areas, respectively. In Figure 4.6 the low density regions correspond to the lower thickness values, i.e.
points A, B, and D are low density and points B and C are medium-low. We see that the
predictions of the removal in the down areas are fairly accurate for the high and medium
density regions, yet fairly poor for the medium-low density regions. The predictions are
accurate in the high density regions because the removal rate of these features is low, and
the step height is still too large for the pad to touch the down area. Therefore, the high and
medium density features are at the beginning of the profiles shown in Figures 4.3 and 4.4.
Note that the removal in the low and medium-low density regions is over estimated. We
can see in Figure 4.7 that at these same locations, the down area removal in these regions
is under-predicted. This is the same characteristic as suggested by the difference of the
density model and the IMEC model in Figure 4.4. Therefore, the failure of the model to
132
Gradual
Density
Region
Gradual
Density
Region
Gradual
Pitch
Step
Density
Region
Region
12000
11000
10000
:E
F-
9000
M
0
8000
7000
E
6000
RMSE =22oA
V
10
0
30
20
40
50
60
Site #
Figure 4.6: Measured and modeled values for the post-polish thickness of the
raised areas using the MIT density model (dashed line is the model fit).
1.7 1
1.68
1.66
IF.
1.64
1.62
-
--C
E
-I
1.6
-
0
1.58
0
1.56
B
1.54
-A
1.52
1.5 L0
D
10
20
RMSE=244A
30
40
50
60
Site #
Figure 4.7: Measured and modeled values for the post-polish thickness of the
down areas using the MIT density model (dashed line is the model fit).
accurately predict the removal in locations B, C, E, and F is most likely because the pad
133
has touched the down area before the time predicted by the density model. This causes an
increase in the removal of the down area and a decrease in the removal of the raised area.
On the other hand, the removal in locations A and D is over estimated in both the raised
and down areas. The inaccuracy of the density model in these locations is not explained by
the early contact of the IMEC model. We will return to this issue later.
4.3 A Combined Density and Step Height Model
We will now incorporate the step height dependent model into the density model to
capture the benefit of modeling arbitrary layouts, while improving performance with step
height dependencies. We begin by integrating the expressions in Figure 4.2 to obtain the
amount removed in the raised areas
tPK/ p
AR
tP < tc
(4.1)
h
"
tcK/p +K(t - tc)+01-
p)"
I1-
t > tc
e (Pt')
and the amount removed in the down areas
0
h,
ARd =
K(tP-tc)- p
tP < tc
(t -t,)/t
I -e
(4.2)
)
> tc
We then assume that the feature density, p, can be replaced by an effective density, as in
the density model. We are then left with the challenge of using the effective densities and
these equations to explain the experimental data (pre- and post-polish measurements for
raised and down areas) from an arbitrary layout. We will outline three methods for doing
this. In each of these methods, we need to find K, t, the
tc
for each measurement site, and
the effective density of each measurement site. As in the density model, we assume the
134
effective density is determined by calculating the average density within a window, and
that the window size is determined by a single planarization length parameter.
The first method for determining these parameters picks a planarization length, and
calculates the effective density for each measurement site. Using the measurements and
effective densities, we perform a multivariate constrained optimization to find K and t, as
well as a contact time
t,
for each measurement site. This process is repeated until the
parameters which provide the best fit (i.e. minimum mean squared error) of the model to
the experimental data are found. The following constraints are necessary to maintain positive removal rates in both the raised and down areas.
t 0, tct!
pho
K
pho
p0,ht
(4.3)
,
and
K(t -- tc) < p
for (Vtj (tc < t < tp))
(, - e -t-,)
.(4.4)
Using the experimental CMP data that we used for the density model above, this timedensity model was fit to the data. The results are shown in Figures 4.8 and 4.9. The raised
area fit of the time-density model is a 50% improvement over the original density model.
This improvement is largely in the low density regions, A through F. The down area fit of
the time-density model is also 50% better than the original density model. Here we see a
significant improvement in the low density region B, in the medium-low density regions C
and E, and in the medium density region F. The early removal of the down area material
over that of the density model significantly improves the predictions in these regions.
Unfortunately, the predictions in the low density regions A and D still have significant
error. As we stated in our analysis of the density model, we did not expect the time-density
135
I 'affinn
12000
11000
0
10000
9000
8000
7000
RMSE = 11oA
6000
*--o
Actual
Model
-,
0
10
20
30
so
40
60
Site #
Figure 4.8: Measured and modeled values for the post-polish thickness of the
raised areas using the time-density model (dashed line is the model fit).
4
1X 0
1.68
1.66-
--
E
0
I
1.64
1.62
0
1.61
a
C
1.581.56
RMSE =120A-
1.54
1.5:2
1.I
0-* I
8----
Actual
~Model
-
I',
0
10
20
30
40
50
60
Site #
Figure 4.9: Measured and modeled values for the post-polish thickness of the
down areas using the time-density model (dashed line is the model fit).
model to correct these locations. It is possible that these errors are caused by poor measurement data. However, these profiles are highly reproducible on other data sets, and sim-
136
ilar errors result. Thus, it is more likely that these effects are real. Figures 4.8 and 4.9
indicate that the time-density model is unable to predict enough removal in the raised
areas of the low density regions without over-estimating the removal in the down areas. It
is possible that the macro-structure of the pad is having an additional effect not captured
by the simple "pad contact" model. This suggests that more work is necessary to understand these effects.
4.4 Variations of the Time-Density Model
The previous method has a few problems. First, the large number of parameters (three
plus the number of measurement sites) causes the determination of the model to be computationally intensive. Second, having a variable contact time for every site may cause
over-fitting of the data. Thus we may be able to fit the data, but not be able to predict the
thicknesses on other data sets. Third, these variable contact times make it difficult to predict post-polish thicknesses for arbitrary layouts. Finally, the optimal contact times result
in contact step heights that have a functional dependence on density which conflicts with
the findings of [47]. Figure 4.10 shows the fitted contact times determined from the optimization of the time-density model, plotted against effective density. The contact times
above 40% density are plotted at the time of polish, meaning these features have not yet
contacted the down area. These fitted contact times lead to the contact step heights shown
in Figure 4.11. Again, the contact heights beyond 40% are determined by the maximum
value of the polish time, and are not meaningful. Results in [47] indicate that the contact
step height increases monotonically with decreasing density. However, the contact heights
below 40% do not agree with these results. These results, combined with the fact that 63
137
4500
4000
'3500 -
*c
,3000
2500
00)
2000
1500
Contact heights
from maximum times
1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Effective Density
Figure 4.10: Model fit of the step height at contact time as a function of the effective feature density.
40
CD
35
a)
i
.30
E
U
25
0
0
20
Contact times
15
are maximum
(at polish time)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Effective Density
Figure 4.11: Model fit of the contact time as a function of the effective feature
density.
parameters were used to fit 120 data points, suggests that we are most likely overfitting.
In response to these problems, we have devised two variations of this method. These
138
utilize contact step heights which have a functional dependence on either the effective
density or the feature line space. The first variation utilizes the functional dependence
h] = ki + k 2e-'I8, where h, is the contact step height, and k, , k2 , and 8 are variable con-
stants. This relationship restricts the contact step height to exponentially decrease with
increasing density. This reduces the number of parameters in the previous method from
3+N to six. In addition, the relationship of contact step height on density can be re-used for
model prediction on other arbitrary devices.
The results of the model fit for this variation are shown in Figures 4.12 and 4.13. Here
we can see that this model also works quite well. There is a slight decrease in the quality
of fit in the raised areas. This suggests that there is indeed a strong correlation of the contact height to density. The optimal contact step height dependence on density using this
functional form is shown in Figure 4.14. The functional dependence of the contact step
height in this case is very different from that determined with the variable contact times
above. However, this dependence on density agrees with that suggested in [47]. This suggests that the fit from the previous method was most likely over-fitting. Figure 4.15 shows
the model fit errors for both the raised and down areas. We can see from this figure that
there appears to be larger errors around the 50% density region. The last 15 data points in
Figures 4.11 and 4.12 are all 50% density lines with pitch varying from 25 to 250 Rm.
These errors indicate that there may be a functional dependence of the contact step height
on pitch or line space.
Therefore,
hi = ki + k 2 l + k 3
the
2+
k 4 14 ,
second
utilizes
variation
the
functional
dependence
where 1 is the feature line space, and k, , k 2 , k3 , and k 4 are vari-
139
12000
0-
-
11000-
U)
10000 -
9000U)
80007000-
RMSE = 137A
6000[
0-
U- - K
5000
0
10
20
30
Actual
Model
60
50
40
Site #
Figure 4.12: Measured and modeled values for the post-polish thickness of the
raised areas using time-density model with contact height as a function of density.
X 104
1.7[-
1.68
UA
1.66
,1.64
U)
0E 1.62
(4_
cc
-
1.6
-
VY
-
-
0
1.58
1.56
1.54
-
RMVSE =115A~
-*e--oeActual
- - 0 Mode
1.52
1.5
0
10
20
30
40
so
60
Site #
Figure 4.13: Measured and modeled values for the post-polish thickness of the
down areas using time-density model with contact height as a function of density.
able constants. This reduces the number of parameters in the original method from 3+N to
seven. Again, this relationship of contact step height on line space could be used for model
140
7000
6000
5000 -
4000-
3000cc
2000-
-
1000-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Effective Density
Figure 4.14: Model fit for contact step height as a function of the effective feature
density.
500
0
L.
h..
0
-500
0
L-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
500I
0-
0
a
1
-5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Effective Density
Figure 4.15: Model errors for raised and down areas as a function of the effective
feature density.
prediction on arbitrary devices. The results of the model fit for this variation are shown in
Figures 4.16 and 4.17 Here we see that the line space dependence does improve the fit of
the raised areas. Unfortunately, the dependence on line space shown in Figure 4.18 sug-
141
13000
12000
0-
1 1000
W
0 .10000k
0U
0
:E
9000
8000
7000
RMSE = 121A
6000
*--.
Actual
Model
0
30
20
10
40
50
60
Site #
Figure 4.16: Measured and modeled values for the post-polish thickness of the
raised areas using time-density model with contact height as a function of line
space.
1
.7:x 10
1.68
1.660
0
0
0
0
1.64-
0
1.6-
I-
I
-
-
1.62-
0
0
a-
1.58
0
1.54
0
1#
-
I
-
1.56
RMSE =111A
-i
1.52
1.5'0
--_J_
A
10
30
20
40
Model
50
60
Site #
Figure 4.17: Measured and modeled values for the post-polish thickness of the
down areas using time-density model with contact height as a function of line
space.
gests that the contact step height has a minimum which is not at a line space of zero. Intuitively, we would expect that the pad would be more able to reach down into larger line
142
spaces, giving larger contact heights as line space increases. Therefore, the variation in
contact height with line space shown in Figure 4.18 should be treated cautiously, and may
be due to a confounding with density or other effects. Other test masks may be necessary
to separate these effects.
,ar.nn
3000
2500
2000
1500
0
(L
1000
500
0
50
100
150
200
250
Line Space (gm)
Figure 4.18: Model fit for step height at contact time as a function of line space.
4.5 Polish Time and Device Dependencies
Now that we have developed a new model, we would like to determine whether we can
use it for the control of multiple devices. In order to do this, we need to determine the
answer to two questions. First, is this model able to fit to data over a range of polishing
times? Second, does this model remove the device dependencies seen earlier in the control
experiments?
Recall that we plotted the evolution of the MIT density model fits for various polish
times (along with the respective experimental data) in Figure 3.12. This model was run
143
1 Seconds
I-
4
41 Seconds *WW
490~
9 Seconds
is
1000600
E
4000
2000
'
10
20
30
40
50
60
Site #
Figure 4.19: Measured and modeled values for the post-polish thickness of the
raised areas using time-density model with contact height as a function of density
for various polish times. Dashed lines are model fits and stars are experimental
data points.
using similar data, and the resulting predictions for the raised areas using this model are
shown in Figure 4.19. Upon close comparison of these fits and those in Figure 3.12, we
find this model fits significantly better in the low density regions surrounded by higher
densities, particularly regions A and D.
Recall that there were two problems with the device dependencies from our control
experiments; the extracted planarization lengths were different for the different devices,
and, once the lengths were fixed, there were device dependencies in the extracted blanket
rates. Therefore, we would like to test these dependencies using this model, with the same
data collected from the polishing of the wafers during the control experiments. Using the
data from wafers polished during the final control experiment, we extracted the planarization lengths using the time-density model. The planarization lengths are plotted against the
144
Al UllI I
E
Th3500C)
0
0
N
3000-
Ce
25001
0
s
0.5
1.5
1
2
2.5
3
Device #
Figure 4.20: The planarization length as a function of the device number being run
(using the experimental data from the third control run in Chapter 3).
device in Figure 4.20. Here we see that there is still a device dependency in the planarization length. Therefore, the time-density model is also most likely not a device independent
model. It is possible that this model could still be used with the device dependent planarization lengths, similar to that done in Chapter 3. In order to test this idea, we again
fixed the planarization lengths at the above values and extracted the remaining model
parameters from the experimental data from the third control experiment in Chapter 3.
These parameters included the blanket rates, as well as the parameters for the contact
height functional dependence on density. The extracted blanket rates are shown in Figure
4.21, as a function of the device being run. We see that, although much less, there is still a
minor device dependency. The other parameters show similar dependencies. This suggests
that improved control may be obtained using this controller, but more improvements in the
CMP model are necessary to remove all the dependencies.
145
AMII III I
3500-
Ad
r
3000 -
2500
0.5
1
1.5
2
2.5
Device #
Figure 4.21: The blanket rate as a function of the device number being run (using
the experimental data from the third control run in Chapter 3)
4.6 Summary
The density model was shown to be insufficient to completely characterize the removal
in medium to low density features in dielectric polishing. Differences in the density model
and the step height dependent model explain errors in the fitting of the density model, and
a combined model was shown to provide up to a 50% improvement in fitting errors of both
raised and down area thicknesses. Variations of this model significantly reduce the number
of model parameters and provide the ability to predict post-polish thicknesses for arbitrary
layouts. Although this model provides improved predictions for multiple polish times over
the evolution of the polishing process, device dependencies still exist in the improved
model. Future work is necessary to understand the contact height dependencies on density,
line space, and pitch. In addition, further improvements are needed to remove the device
dependencies in the model. Recent work suggests that long range interactions across sev-
146
eral devices create a pad flexing limit, which may possibly remove such device dependencies in the model.
147
148
Chapter 5
Conclusions and Future Work
Chemical-mechanical polishing has become a critical process in the manufacture of
integrated circuits. However, due to the many complexities of CMP, controlling the process in a production setting is challenging. Goals on various metrics must be achieved,
including: removal rate, within-wafer non-uniformity, step-height, total indicated range,
within-die non-uniformity, and wafer-to-wafer non-uniformity. In addition, a number of
difficulties exist in controlling the CMP process, including the drift in the polish characteristics of blanket and patterned wafer performance metrics over time, and the challenge of
control given a mix of device types on the same tool.
Major issues involved with an implementation of a run by run control system for use in
a production environment include: quality, cost, flexibility, and ease of use. Chapter 2
demonstrated that the frequent measurements provided by integrated metrology, combined
with proper controller tuning, result in high quality control of the average post-polish
thickness measured at a few locations within several die of a single device. The automatic
measurement of the post-polish wafers and the relatively simple control algorithm provide
a high degree of ease of use. On-line metrology was demonstrated to be an effective means
for providing measurement feedback to the controller. The variability of the on-line CMP
metrology tool is well below CMP requirements, and measurements from this tool correlate well with measurements from ex-situ metrology tools. A 600 wafer experiment verified this and demonstrated that the on-line metrology tool also has good reliability. The
149
simplification of processing using on-line metrology and the reduction in thickness variation from improved control result in a reduced cost for the CMP process. We showed
increases in throughput of up to 80% led to reductions in cost of ownership of up to 32%.
This approach also benefits the environment by reducing water, chemical, and power
usage by 0% to 66.7%. Finally, we demonstrated that simple EWMA filtering of the average removal rate is sufficient for controlling lot to lot trends in the CMP process. This type
of filtering resulted in control of the average post-polish film thickness for a single device
with an average error of less than 100A, suggesting that complex filtering techniques are
not necessary to control the lot to lot variability in CMP. Such complex filtering
approaches would only serve to decrease the ease of use and increase the controller's sensitivity to random process noise.
Although the quality of control presented in Chapter 2 was very high, there are several
problems with this approach. First, using a reduced number of die for control leads to a
shift in the average thickness measured by a larger number of die. In addition, estimates
based on this experiment suggest that using traditional approaches to control the average
thickness with a small number of measurements may also increase variability due to an
increased sensitivity to measurement errors. Finally, estimates show that performing control using pilot wafers and sheet film equivalents would result in a 39% increase in the
average error of the controlled average thickness, over performing control directly on patterned wafers. These results indicate that the spatial sampling techniques used in this type
of control may produce misleading results.
In Chapter 3, we outlined the magnitude of the sources of variation in CMP, including
within-wafer, within-die, and lot-to-lot variability. Although blanket wafer metrics are
150
most often those used to monitor and control the CMP process, they are second order
effects in terms of the variability of the CMP process. The result is that traditional techniques, like those outlined in Chapter 2, are inappropriate for dealing with the many complex issues involved with controlling the CMP process.
The problems with using traditional techniques to control the average post-polish
thickness were explained by the MIT density model. The model shows how variations in
pattern densities lead to large variations in the post-polish thickness profile. These variations due to pattern densities are why both the value and variability of the controlled average thickness using traditional control techniques depend on the measurement locations.
In addition, the model demonstrates that the average thicknesses have non-linear dependencies and pattern layouts, and thus approaches that use linear relationships between the
blanket removal rate and the patterned removal rate (i.e. controllers using sheet film equivalents) generally result in a decreased quality of control. Finally, we described how different devices often have large differences in their pattern layouts, that result in large
differences in their post-polish thickness profiles. These device dependencies are the
source of much confusion in the monitoring and control of the CMP process, and lead to
several problems, including: controlling to the incorrect average thickness, not taking
device dependencies into account when updating the tool model, and not monitoring or
even having a rough idea of the global planarity resulting from the CMP process.
We demonstrated that within-die non-uniformity is optimized at a particular process
setting, and, since this is the largest source of variability in the oxide CMP processes
examined here, this process setting should remain fixed. As a result, approaches which
attempt to control the average removal rate or the within-wafer non-uniformity by chang-
151
ing the process settings should not be used. In addition, we demonstrated that step-height
is largely a redundant measure of within-die variation, which we suggested should not be
controlled. We outlined that an ideal controller would control the average post-polish
thickness profile of multiple devices while monitoring the global non-uniformities of the
different devices being polished using only a few measurements on the wafer.
We then presented a device independent controller that integrates the MIT density
model into the control strategy. This controller correlates the value being measured with
an estimate of the thickness at its corresponding point in the layout based on the density
model. The controller uses differences in the measured thickness and this estimate to
update device independent process parameters, i.e. blanket removal rate and planarization
length. The controller uses these device independent parameters and the pattern layout for
the device being polished to predict a profile of the post-polish thickness across the entire
wafer at a very densely sampled grid. The polishing time used to generate this surface is
varied until the average thickness of this profile is controlled to the desired average thickness. In addition, this surface is used to predict global non-uniformity of the post-polish
thickness profile. By making a separation between what is measured and what is controlled, the control framework provides several benefits. First, it allows measurements
from any device to be used to update the tool level model that can be used with any other
device being processed. Second, it allows us to measure only a few points, while very
accurately controlling the average of the true thickness profile. Third, it allows us to monitor the total non-uniformity of the polishing process for each device being processed. This
eliminates the need for a large number of test wafers aimed at determining uniformity.
Our first experiment demonstrated that this type of control results in a lot to lot vari-
152
ability of 400A. This is believed to be fairly good relative to existing techniques for controlling multiple devices. Unexpectedly, the planarization length of the MIT density model
was found to be a function of the device. This is unfortunate because the parameter is not
truly device independent, and suggests that an improved model is necessary to achieve
truly device independence control. In our second experiment, we fixed the planarization
lengths for the different devices so that we could improve the control with the existing
model. This resulted in an average error in the controlled thickness of 200A, which is
believed to be significantly better than existing methods. Our third experiment demonstrated control of the average post-polish thickness with an average error of only 100A,
using a device dependent model update strategy. This strategy included adjustments to the
blanket removal rates based on the particular device begin run. In all cases, we were able
to accurately predict the total non-uniformity of the polished wafers.
While the framework outlined in Chapter 3 provided the ability to accurately control
the average thickness of multiple devices and monitor to the global non-uniformity with
only a few measurements, some remaining device dependencies are needed in order to
provide the greatest flexibility and highest quality. These device dependencies appear to
be related to the quality of the model used for control. Chapter 4 explored one possible
model extension in order to remove these dependencies. It was shown that the density
model for dielectric CMP is not sufficient to completely characterize the removal in
medium to low density features. We demonstrated that differences in the density model
and the step height dependent model proposed by IMEC suggest a combined model would
improve fitting and possibly remove the device dependencies in the controller. We demonstrated that a combined step-density model provides up to a 50% improvement in fitting
153
errors of both raised and down area thicknesses, and improves fitting errors at multiple
polishing times over the evolution of the polishing process. Variations of this model significantly reduce the number of model parameters and allow predictions of the post-polish
thicknesses for arbitrary layouts. Despite this effort at improving the model, it was shown
that device dependencies still exist with the improved model, and further modeling work is
necessary to remove the device dependencies in the controller, and thus provide the maximum quality control with the greatest ease of use and flexibility.
In conclusion, we outlined how current techniques that fail to take into account the
device characteristics as a whole provide ineffective control strategies. These problems
can be overcome by incorporating an advanced process model into a controller that separates what is measured from what is controlled. We have provided a framework for device
independent control of dielectric chemical-mechanical polishing. This provides accurate
control of the true average thickness and effectively monitors the global uniformity of
multiple devices being processed on a single tool. In addition, the approach requires only a
small number of measurements in order to achieve this.
Further work is still necessary to make the model, and thus the control strategy, completely device independent. Understanding the model dependencies on density, line space,
and pitch are critical to removing these device dependencies. Recent work suggests that
long range interactions across the device create a pad flexing limit, which may possibly
remove such device dependencies in the model.
154
155
156
References
[1]
M. Martinez, "Chemical-mechanical polishing: Route to global planarization," Solid
State Tech., p. 26, May 1994.
[2]
T. Park, T. Tugbawa, J. Yoon, D. Boning, J. Chung, R. Muralidhar, S. Hymes, Y
Gotkis, S. Alamgir, R. Walesa, L. Shumway, G. Wu, F. Zhang, R. Kistler, J.
Hawkins, "Pattern and Process Dependencies in Copper Damascene Chemical
Mechanical Polishing Processes," VLSI Multilevel Interconnect Conference, Santa
Clara, CA, June 1998.
[3]
C. Yu, P C. Fazan, V K. Mathews, and T. T. Doan, "Dishing effects in a chemical
mechanical polishing planarization process for advanced trench isolation," Appl.
Phys. Lett., vol. 61 no. 11, Sep 1992.
[4]
A. Hu, X. Zhang, E. Sachs, and P. Renteln, "Application of Run by Run Controller to
the Chemical-Mechanical Planarization Process, Part I," IEEE Proc. of the 15th Int.
Elect. Manuf Tech. Symp., Oct. 1993.
[5]
A. Hu, H. Du, S. Wong, P. Renteln, and E. Sachs, "Application of Run by Run
Controller to the Chemical-Mechanical Planarization Process, Part II," IEEE Proc.
of the 16th Int. Elect. Manuf Tech. Symp., Oct. 1994.
[6]
R. Jairath and L. Markert, "Metrology and Process Control Issues in Chemical
Mechanical Polishing," NIST 1995 Semiconductor CharacterizationWorkshop, Jan.
1995.
[7]
A. Altman, "Applying Run by Run Process Control to Chemical-Mechanical
Polishing of Sub-Micron VLSI: A Technological and Economic Case Study," S.M.
Thesis, MIT EECS, May 1995.
[8]
A. Hu, H.-P. Dun, P Renteln, and E. Sachs, "Sensor Development and Process
Control for Chemical-Mechanical Planarization of Multilevel Interconnect
Devices," Electrochem. Soc. Meeting, June 1995.
[9]
J. Moyne, R. Telfeyan, A. Hurwitz and J. Taylor, "A Process-Independent Run-toRun Controller and Its Application to Chemical-Mechanical Planarization," to be
presented, Sixth Annual SEMI/IEEE ASMC, Boston, Nov. 1995.
[10] D. Boning, W. Moyne, T. Smith, J. Moyne, R. Telfeyan, A. Hurwitz, S. Shellman and
J. Taylor, "Run by Run Control of Chemical-Mechanical Polishing," IEEE Trans. on
Comp., Pack., and Manuf Technol. - PartC, Vol. 19, pp 307-314, 1996.
157
[11] T. Smith, D. Boning, J. Moyne, A. Hurwitz, and J. Curry, "Compensating for CMP
Pad Wear Using Run by Run Feedback Control," Proc. VLSI Mulitlevel Interconnect
Conf, Santa Clara, pp. 437, 1996.
[12] T.Smith, "Novel Techniques for the Run by Run Process Control of ChemicalMechanical Polishing," S. M. Thesis, MIT EECS, 1996.
[13] J. Moyne, J. Curry, D. Eylon, and R. Kipper, Proc. SEMATEC AEC/APCWorkshop
IX, pp. 374, 1997.
[14] J. Moyne and J. Curry, "A Fully Automated Chemical-Mechanical Polishing
Planarization Process," Proc. of 1998 VLSI Multilevel Interconnect Conf., pp. 515517, June 1998.
[15] G. Dishon, D. Eylon, M. Finarov, and A. Shulman, "Dielectric CMP Advanced
Process Control Based on Integrated Thickness Monitoring," Proc. of 1998 CMPMIC, 1998.
[16] A. Ingolfsson and E. Sachs, "Stability and Sensitivity of an EWMA Controller," J. of
Quality Technol., Vol. 25, No. 4, pp. 271-287, 1993.
[17] S. Butler and J. Stefani, "Supervisory Run-to-Run Control of Polysilicon Gate Etch
Using In Situ Ellipsometry," IEEE Trans. on Semi. Manuf, Vol. 7, No. 2, pp. 193201, 1994.
[18] J. Stefani, S. Poarch, S. Saxena and P. K. Mozumder, "Advanced Process Control of a
CVD Tungsten Reactor," IEEE Trans. on Semi. Manuf , Vol. 9, No. 3, 1996.
[19] E. Del Castillo and A. Hurwitz, "Run-to-Run Process Control: Literature Review and
Extensions," J.of Quality Technol., Vol. 29, No. 2, pp. 184-196, 1997.
[20] T. Smith, J. Stefani, D. Boning, and S. Butler, "Run By Run Advanced Process
Control of Metal Sputter Deposition," IEEE Trans. on Semi. Manuf, Vol. 11, No. 2,
pp. 276-284, May 1998.
[21] J. S. Hunter, "The Exponentially Weighted Moving Average," Journal of Quality
Tech., Vol. 18, No. 4, October 1986.
[22] G. E. P. Box and T. Kramer, "Statistical Process Control and Automated Process
Control - A Discussion," Technometrics, Vol. 34, No.3, pp. 251-267, 1992.
[23] S. Crowder, "Design of Exponentially Weighted Moving Average Schemes," Journal
of Quality Tech., Vol. 21, No. 3, July 1989.
[24] J. Lucas and M. Saccucci, "Exponentially Weighted Moving Average Control
Schemes: Properties and Enhancements," Technometrics, Vol. 32, No. 1, Feb. 1990.
158
[25] D. M. Koenig, Control and Analysis of Noisy Processes, Prentice-Hall, Englewood
Cliffs, NJ, 1991
[26] S. Crowder and M. Hamilton, "An EWMA for Monitoring a Process Standard
Deviation," Journalof Quality Tech., Vol. 24, No. 1, Jan. 1992.
[27] E. Sachs, R. Guo, S. Ha and A. Hu, "Tuning a Process While Performing SPC: An
Approach Based on the Sequential Design of Experiments," Proc. of IEEE/SEMI
ASMC, 1990.
[28] E. Sachs, R. Guo, S. Ha and A. Hu, "Process Control System for VLSI Fabrication",
IEEE Trans. on Semi. Manuf., Vol. 4, 1991.
[29] E. Sachs, A. Hu, and A. Ingolfsson, "Run by Run Process Control: Combining SPC
and Feedback Control," IEEE Trans. Semi. Manuf., vol. 8, no. 1, pp. 26-43, Feb.
1995.
[30] T. Smith and D. Boning, "A Self-Tuning EWMA Controller Utilizing Artificial
Neural Network Function Approximation Techniques," IEEE Trans. on Comp.,
Pack., and Manuf Technol. PartC, Vol. 20, No. 2, pp. 121-132, April 1997.
[31] T. Smith and D. Boning, "Artificial Neural Network Exponentially Weighted Moving
Average Controller for Semiconductor Processes," J. Vac. Sci. Technol. A, Vol. 15,
No. 3, pp. 1377-1384, May 1997.
[32] M. Le, T. Smith, D. Boning, and H. Sawin, "Run to Run Model Based Process
Control on a Dual Coil Transformer Coupled Plasm Etcher," 191st Meeting of the
ElectrochemicalSociety, pp. 332, May 1997.
[33] T. Smith and D. Boning, "Enabling Intermittent, Delayed, and Non-Periodic Data
Sampling with Predictor Corrector Control," J. of Vac. Sci. and Technol., in press.
[34] E. Del Castillo, "Long Run and Transient Analysis of a Double EWMA Feedback
Controller," IIE Trans., in press.
[35] R. Guldi, et al., "Process Optimization Tweaking Tool (POTT) and its Application in
Controlling Oxidation Thickness," IEEE Trans. on Semi. Manuf., Vol. 2, pp. 54-59,
1989.
[36] S. Leang and C. Spanos, "Statistically Based Feedback Control of Photoresist
Application," Proc. of IEEE/SEMIASMC, pp. 185-190, 1991.
[37]
P. Chatterjee and P. Mozumder, Eds., "Special Issue on Microelectronics
Manufacturing Science and Technology, Trans. on Semi. Manuf., Vol. 7, No. 2, May
1994.
159
[38] J. Baras and N. Patel, "Designing Response Surface Model Based Run by Run
Controllers: A New Approach," IEEE/CMPT Intl. Manuf Technol. Symp., pp. 210217, 1995.
[39] X. Wang and R. Mahajan, "Artificial Neural Network Model-Based Run-to-Run
Process Controller," IEEE Trans. on Comp., Pack., and Manuf Technol. - Part C,
Vol. 19, No. 1, pp. 19-26, 1995.
[40] E. Del Castillo, and J. Yeh, "An Adaptive Run-to-Run Optimizing Controller for
Linear and Nonlinear Semiconductor Processes," IEEE Trans. on Semi. Manuf., Vol.
11, No. 2, pp. 285-295, 1998.
[41] S. Leang, S. Ma, J. Thompson, B. Bombay, C. Spanos, "A Control System for
Photolithographic Sequences," Trans. on Semi. Manuf., Vol. 9, No. 2, pp. 191-207,
1996.
[42] N. Jakatdar, X. Niu, J. Musacchio, J. Boa, and C. Spanos, "DUV Lithography
Control," Proc. of 1998 SEMATECH AEC/APC Symp., pp. 137-148, 1998.
[43]
National Technology Roadmap for Semiconductors: Semiconductor Industry
Association Report, SEMATEC Inc., Austin, TX., 1997.
[44] A. Sethuraman, B. Koutny, and C. Kallingal, "A Novel Planarization Method for
CMP of Dielectric Layers for ILD and STI Using Slurry Free Process," 7th ISSM,
pp. 239-241, 1998.
[45] D.P. Goetz, "The Effect of Subpad Construction on Pattern Density Effects for
Slurry-Free CMP," CMP-MIC,pp. 234-241, Feb. 1999.
[46] E. Tseng, C. Yi, H.C. Chen, "A Mechanical Model for DRAM Dielectric Chemical
Mechanical Polishing Process," CMP-MIC, pp. 258-265, Feb.1997.
[47]
J. Grillaert, M. Meuris, N. Heyley, K. Devriendt, E. Vrancken, M. Heyns,
"Modelling Step Height Reduction and Local Removal Rates Based on PadSubstrate Interactions," CMP-MIC, pp. 79-86, Feb. 1998.
[48] T. Smith and D. Boning, "A Study of Within-wafer Non-uniformity Metrics," 4th
Intl. Workshop on StatisticalMetrology, Kyoto, Japan, Jun. 1999.
[49] D. Kim, S. Kim, Y Lee, S. Kim, and K. Suh, "Study of Micro-Scratch on Oxide Film
in VLSI Circuit," Proc. of 1999 VLSI-MIC, pp.283-287, 1999.
[50] E. Chang, B. Stine, T. Maung, R. Divecha, D. Boning, J. Chung, K. Chang, G. Ray,
D. Bradbury, S. Oh, D. Bartelink, "Using a Statistical Metrology Framework to
Identify Random and Systematic Sources of Intra-Die ILD Thickness Variation for
CMP Processes," IEDM Tech, Digest, pp. 499-502, 1995.
160
[51] D. Ouma, B. Stine, R. Divecha, D. Boning, J. Chung, I. Ali, and M. Islamraja,
"Using Variation Decomposition Analysis to Determine the Effects of Process on
Wafer and Die-Level Uniformity in CMP," First International Symposium on
Chemical Mechanical Planarization (CMP) in IC Device Manufacturing, 190th
Electrochemical Society Meeting, San Antonio, TX, 1996.
[52]
B. Stine, V. Mehrotra, D. Boning, J. Chung, D. Ciplickas, "A Simulation
Methodology for Assessing the Impact of Spatial/Pattern Dependent Interconnect
Parameter Variation on Circuit Performance," IEDM Tech, Digest, pp. 133-136,
1997.
[53] B. Stine, D. Ouma, R. Divecha, D. Boning, J. Chung, D. Hetherington, I. Ali, G.
Shinn, J. Clark 0. S. Nakagawa, S.-Y Oh, "A Closed-Form Analytic Model for ILD
Thickness Variation in CMP Processes," Proc. CMP-MIC Conf, Santa Clara, CA,
1997.
[54] G. Nanz and L. Camilletti, "Modeling of Chemical-Mechanical Polishing: A
Review," IEEE Trans. on Semi. Manuf., Vol 8, No. 4, pp. 382-389, 1995.
[55] B. Stine, D. Ouma, R. Divecha, D. Boning, J. Chung, "Rapid Characterization and
Modeling of Pattern Dependent Variation in Chemical Mechanical Polishing," IEEE
Trans. Semi. Manuf., Feb. 1998.
[56] D. Ouma, "Modeling of chemical-mechanical polishing for dielectric planarization,"
MIT Ph.D. Thesis, 1998.
[57]
V. Mehrotra, S. Nassif, D. Boning, and J. Chung, "Modeling the Effects of
Manufacturing High-Speed Microprocessor Interconnect Performance," 1998
InternationalElectron Devices Meeting, San Francisco. CA, Dec. 1998.
[58] H. Scheff6, The Analysis of Variance, John Wiley and Sons, New York, NY, 1959.
[59] B. Efron, The Jackknife, the Bootstrap, and Other Resampling Plans, Society for
Industrial and Applied Mathematics, Philadelphia, PA, 1982.
[60] P. Mozumder and L. Lowenstein, "Method for Semiconductor Process Optimization
Using Functional Representations of Spatial Variations and Selectivity," IEEE Trans.
on Comp., Hybrids, and Manuf Tech., vol. 15, no. 3, pp. 311, (1992).
[61] R. Guo and E. Sachs, "Modeling, Optimization, and Control of Spatial Uniformity in
Manufacturing Processes," IEEE Trans. on Semi. Manuf, vol. 6, no. 1, pp. 41-57,
(1993).
[62] D. Drain, Statistical Methods for Industrial Process Control, Chapman & Hall, New
York, NY, 1997.
161
[63] T. Smith, B. Goodlin, D. Boning, and H. Sawin, "A Statistical Analysis of Multiple
and Single Response Surface Modeling," IEEE Trans. on Semi. Manuf, 1999.
[64] F.W. Preston, "The Theory and Design of Plate Glass Polishing Machines," J Soc.
Glass Technol., Vol. 11, pp. 214-256, 1927.
[65] P. Renteln, et al., VLSI Multilevel Interconnect Conference, pp. 57-63, Santa Clara,
CA, 1990.
[66] Y Hayashide, M. Matsuura, M. Hirayama, T. Sasaki, S. Harada, H. Kotani, "A novel
optimization method of chemical mechanical polishing (CMP)", Proc. VLSI
Mulitlevel InterconnectConf, Santa Clara, CA, pp. 464-470, 1995.
162
Download