MODELING OF CHEMICAL MECHANICAL POLISHING USING FIXED ABRASIVE TECHNOLOGY by Radhika Dutt Bachelor of Science, Massachusetts Institute of Technology, June 2000 Submitted to the Department of Electrical Engineering and Computer Science in Fulfillment of the Requirement for the Degree of Master of Engineering in Electrical Engineering at the MASSACHUSETTS INSTITUT OF TECHNOLOGY JUL 2001 E1 Massachusetts Institute of Technology 2 October 25, 2000 LIBRARIES © Massachusetts Institute of Technology, 2000. All Rights Reserved. A u th o r .............................................................. ......... . ... . ............. Electrical Engineering and Computer Science September, 2000 C e rtifie d by ............................................................................. . . . ..... Duane S. Bonng Associate Professor Electrical Engineering and Computer Science A c c e p te d b y ............................................................................................. Arthur C. Smith Chairman, Department Committee on Graduate Students Electrical Engineering and Computer Science MODELING OF CHEMICAL MECHANICAL POLISHING USING FIXED ABRASIVE TECHNOLOGY by Radhika Dutt Submitted to the Department of Electrical Engineering and Computer Science in Fulfillment of the Requirement for the Degree of Master of Engineering in Electrical Engineering at the Massachusetts Institute of Technology October 25, 2000. ABSTRACT Chemical Mechanical Polishing (CMP) is conventionally carried out using abrasive slurry and a polishing pad. An alternative process is the "fixed abrasive" polish. Existing models, which accurately predict oxide thickness variations across a chip for the conventional CMP process, have not previously been tested for the fixed abrasive process. In this study wafers were polished using the fixed abrasive pad and the data fitted against the density and step-height model. Results show that little down area polish occurs with the use of the fixed abrasive pad, except in areas of lowdensity. The step height model which accounts for contact height, shows improved accuracy over the density model for areas of low density. The density model however is sufficiently accurate for areas of higher density. Modeling new variants of the fixed abrasive pad may require further study and model extensions related to additional effects and pad properties. 2 ACKNOWLEDGEMENTS I would like to thank Professor Boning for this opportunity, for taking me up as an M. Eng student. I still remember having first spoken to him about the M. Eng thesis and my excitement about this project. I appreciate all his help and guidance over the course of the year and am sincerely grateful for his being supportive and understanding of my working on my thesis while concurrently working on my startup, LOBBY7. He has really been a great advisor!! I would like also to thank Daniele, my fianc6e, without whom this would not have been possible. He has been a source of strength, joy and support all through the months when I was writing the thesis. I realize that he has had to make many sacrifices during this time and would like to express my sincerest gratitude. I am also very grateful to my parents for everything I am today. For teaching me to dream and supporting me through all my decisions. Even from a distance they have always cheered me on and been there for me. I would also like to thank Doug Goetz at 3M for his help in this project and his advice. Thanks to Mike Oliver at Rodel for the discussions that provided insight into many aspect of this project. This work has been supported in part by a DARPA subcontract with PDF Solutions, Inc. 3 TABLE OF CONTENTS CHA PTER 1: IN TRO D UC TION ...................................................................... DIELECTRIC PLANARIZATION_ OXIDE STI CHARACTERIZING PLANARIZATION PLANARIZATION LENGTH UNIFORMITY_ PLANARIZATION METHODS THESIS OUTLINE CHAPTER 2: WAFER POLISHING............................................................... CONVENTIONAL CMP SHORTCOMINGS SLURRY-FREE CMP KEY FACTORS IN POLISHING PROCESS PARAMETERS PATTERN LAYOUT CHAPTER 3. MODELING METHODOLOGY.............................................. PRESTON'S MODEL OTHER MODELS DENSITY MODEL STEP-HEIGHT MODEL CHAPTER 4: EXPERIMENTAL WORK......................................................... EXPERIMENTAL DESIGN_ RESULTS STEP-HEIGHT EXTRACTED CHARACTERISTICS MODEL FIT: DENSITY MODEL MODEL FIT: STEP-HEIGHT MODI CHAPTER 5: CO N CLUSIO NS...................................................................... FUTURE WORK REFEREN CE5 ............................................................................................................. 5 6 6 7 9 9 9 10 12 14 14 16 17 20 21 23 27 27 28 31 35 37 37 39 47 50 53 54 57 58 61 4 CHAPTER 1: INTRODUCTION In integrated circuit fabrication, the transistor size has decreased from 10 microns in the 1970's to 0.13 jtm-0.25 pm sub-micron dimensions in 2000. Along with this increased circuit density, the industry has also seen the number of levels of interconnect increase from three in 1980 to six in 2000. The SIA roadmap predicts a nine level interconnect by the year 2012 [1]. An important limiting factor to increasing this number is the ability to planarize interlayer dielectric. Non-uniformity in the interlayer dielectric leads to thickness variations in the interconnect, affecting circuit performance and yield. Chemical Mechanical Polishing (CMP) is currently the most effective method of planarization used in industry. It is often used to planarize topography created during typical microfabrication through lithographic patterning, etching and deposition of thin films on the surface of the wafer. The goal of this process is to preferentially remove the raised features and thus reduce the differences in height across the chip or die. CMP is also used to polish undesired material in order to fabricate in-laid features including shallow trench isolation and metal vias and lines. Due to the number of factors that affect the CMP process, the costs of process reproducibility are high. Indeed, some consider CMP an art more than a science [2]. Recent developments suggest that fixed abrasive pads used in a "Slurry Free CMP" process might be more optimal. Ideally CMP results in a perfectly flat thin film independent of the starting topography. However, in the case of interlevel dielectric (ILD) or oxide CMP, although local planarizaton can be achieved, differences in pattern 5 densities across the chip result in the creation of global non-uniformities. Depending on the pattern density, local step heights may also be of significance. An accurate CMP modeling methodology allows one to predict and thus reduce such wafer/die level non-uniformities. An existing CMP model developed at MIT characterizes the process when conventional polyurethane polishing pads are used in CMP. In this thesis we explore how this model fits the fixed abrasive polishing process. This chapter explains the need for dielectric planarization and summarizes the development of the planarization process in industry. After defining a few key criteria for characterizing planarization a comparison of the various methods of planarization is presented. The chapter then gives an overview of the shortcomings of the conventional CMP process that call for an improved CMP method, the fixed abrasive process. DIELECTRIC PLANARIZATION OXIDE Planarization has become increasingly important with increasing number of levels of metallization. The reason for interlevel dielectric planarization is three fold. First, variations in the height of the dielectric can lead to limited step coverage (Figure 1). When a layer of metal is deposited over a dielectric with large steps, metal thickness varies over the die leading to variations in resistance which can alter circuit functionality. Additionally, when the metal layer is very thin at certain spots, electromigration becomes a concern and such points can be the source of open circuits [3]. Figure 1: Step height coverage problem metal dielectric 6 A second reason for planarization is for more uniform removal of metal during etch. plasma Anisotropic etching does not remove material effectively around steep slopes, and material that is not removed may cause short circuits. The third reason is the concern for meeting lithographic depth of focus requirements. The depth of focus, DF, is given by: DF = X/(NA)2 where X is the wavelength of the projection light and NA is the numerical aperture of the lens. For a small X, which is required for deep sub-micron technologies, and a large enough NA, the local step height after several layers of metallization can easily exceed DF. Planarization is therefore exceedingly important for lithography in advanced technologies. STI One of the most important applications of CMP is in silicon trench isolation (STI) structures. Conventionally transistors were isolated using the local oxidation of silicon or LOCOS technique. This process is illustrated in Figure 2 and involves using a nitride layer to limit oxidation of the surface to a specific local area where oxide is to be grown. The main disadvantage of this process is the lateral diffusion of oxide underneath the nitride. a) Patterning of the active area 4- Photo resist -+ b) Etching of the nitride 150-200 nm nitride Pad oxde d) Removal of the nitride c) Growth of the field oxide /Field oxide Figure 2: Process flow for the isolation technology based on LOCOS [8] 7 This leads to the bird's beak effect which limits the scaling down of this structure for use in the 0.25 ptm and smaller technologies. The development of CMP has introduced a new approach for trench isolation that overcomes the above shortcoming. This method is illustrated in Figure 3. Essentially, this approach involves etching of a trench, filling it with oxide and removing the excess oxide through CMP. Assuming that planarization is ideal, the scaling limits of this technology are determined by lithography, etching of trenches and filling capabilities of the deposition system. The elimination of bird's beak effects in the isolation technique allows scaling to smaller design rules. i Patterning of the active area b) Trench etchino Aitride-A Pad c) Trench filling with UDP-oxide ej Removal of the oxide d) Planarization with CMP nitride Figure 3: Process flow for Shallow Trench Isolation (STI) [8] Effects associated with STI-CMP that are often observed are dishing and erosion. Dishing refers to the level difference between the oxide in the wide trench and the surrounding nitride. Erosion is the level difference of small nitride lines in a dense structure and the surrounding large nitride areas. 8 Additionally, these effects are dependent on the layout pattern density. Different densities therefore cause different amounts of dishing and erosion for various structures. CHARACTERIZING PLANARIZATION Before proceeding to compare methods of planarization, it is important to understand the parameters for their comparison. The following are some of the key criteria and are defined below. PLANARIZATION LENGTH CMP removes local steps but due to the initial pattern density distribution there are global steps after CMP. This is because regions polish at different rates based on the pattern density in each region. The length scale over which there is a transition from one local step height to another is the planarization length. Because it is a function of process attributes, the planarization length is a comprehensive characterization parameter. UNIFORMITY Variation at the wafer level is a strong function of process and does not necessarily follow the initial wafer profile. Die level uniformity however, depends on a combination of the pad and process as well as the layout pattern density. Typically pattern dependent thickness variation is much larger than wafer level variation. The Total Indicated Range (TIR) is defined as the difference between the maximum and minimum thicknesses on the die and is an indication of die-level uniformity. 9 PLANARIZATION METHODS The current standard in industry to achieve planarization requirements is a CMP process that involves removal of material by abrasive slurry and a polishing pad. The typical CMP setup involves a wafer carrier that holds the wafer facedown (Figure 5). Below the wafer-carrier is a turning table, which is covered by a polyurethane polish pad that receives a constant flow of slurry. The pad is grooved to facilitate slurry flow and has a porous surface as shown in Figure 5. It has different elastic and surface properties: a pad such as Rodel's Suba IV is a relatively soft pad, which is used as a subpad below a stiffer pad such as the Rodel IC1000. Oxide slurries can have different particle concentrations and pH levels but are usually KOH-based and contain fumed silica. The pad and slurry properties are key factors in the polishing mechanism and will be explained in the description of the model. Slurry Feed Carrier Platen with pad Figure 4: Pad configuration Grooves to ai slurry transport 6 high density polyurathane coarser film with lower modulus Figure 5: Pad structure for slurry-based CMP 18] 10 With the use of the current CMP process both local as well as global planarities can be reached. The planarization length, which lies in the range of a few millimeters, is also significantly longer than for the oxide reflow, spin-on-glass, or resist etch-back planarization methods of the past. The superior planarization allows damascene CMP processes used in applications such as STI, tungsten plugs and copper metallization. In a damascene process, patterns are etched into a planar dielectric surface and the in-laid material is then deposited. The wafer is then polished until the polish stop dielectric is exposed, leaving metal in vias or trenches (or oxide in STI trenches). The importance of these applications is note worthy. STI enabled by damascene CMP overcomes shortcomings of LOCOS (Local Oxidation of Silicon) such as leakage currents due to "birds beak" effects and allows for a higher circuit density. Tungsten plugs are an integral part of adding further layers of metallization. The use of copper instead of aluminum for electromigration interconnect lowers resistive losses properties. Copper however is not and easily improves etched chemically and the damascene CMP process is essential for advanced copper interconnect. However the conventional CMP process suffers from several shortcomings such as high-cost, high-waste, drift of polish rate while processing and nonuniformities. CMP experiments conducted using fixed abrasive pads have shown improvement in several of these limitations over the conventional pad. The slurry-free process inherently produces less waste, the drift in polish rate is minimal and results from van der Velden show improved within-die and within-wafer uniformity [4]. Perhaps most importantly, the fixed abrasive process has the potential to improve two key patterned wafer polishing characteristics. First, the fixed abrasive as opposed to abrasive particles in the slurry appears to decrease the dishing or "down area" 11 polish between features. Second, the dependence on pattern density may be decreased or modified so as to achieve improved within-die uniformity. While the conventional CMP process has been characterized and modeled, the slurry-free process is in its infancy. A model that accurately characterizes the CMP process has several applications in industry. The ability to predict the dielectric thickness variation can help assess its impact on circuit performance. Further, knowing the optimal amount of dielectric thickness required for a planar surface can reduce the amount of material wasted, and therefore the costs involved. It is therefore important to model this recently developed fixed abrasive process in addition to the conventional pad CMP process. Another major advantage of such a model is that it enables the prediction of areas of thick or thin dielectric layer on a product die. This allows one to modify these faults through the addition of dummy structures, and achieve optimal pattern density and dishing performance. THESIS OUTLINE The goal of this thesis is to characterize the fixed abrasive CMP process and compare it to the conventional process. For the conventional CMP process, models developed at the MIT Microsystems Technology Laboratories accurately predict oxide thickness variations across a wafer due to CMP. However, these models have not previously been tested for the fixed abrasive process. During the course of this project, wafers were polished using the fixed abrasive pad and the data fitted against the models developed by Ouma [5] and Smith [9]. Results are analyzed and compared for the two processes, and the applicability of the MIT CMP model examined. 12 Chapter 2 presents a more detailed explanation of the CMP process and the key factors that affect polishing. In chapter 3 we examine different methods of modeling CMP that have been developed. Chapter 4 describes the design of experiments for the conventional and fixed abrasive process. Following this, results from these experiments including analysis of the polish data (such as the evolution of step height over time) as well as comparison between data and model predictions are presented. Lastly, conclusions are presented in chapter 5 along with suggestions for the direction of future work. 13 CHAPTER 2: WAFER POLISHING The first step in modeling CMP is to understand the polish mechanism and how process parameters contribute. CMP is deeply dependent on parameters such as consumables, process conditions, time, and the machine used. It also has a high variability per run, i.e. the removal rate and uniformity vary during the course of a run. The conventional CMP process has emerged as the industry standard for dielectric planarization. It yields good local and global planarity, is reasonably well understood and has been successfully modeled. However there are high costs associated with this process: reproducibility, maintenance and waste. Developments with the fixed abrasive pad have shown that several of these properties may be improved with their use in the CMP process. This chapter describes the key factors affecting planarization and uniformity. We then examine the polish mechanism in the conventional CMP process and the limitations associated with it. Following this is a description of the fixed abrasive process (sometimes referred to as the slurry-free CMP process). CONVENTIONAL CMP The chemical interaction in the polish mechanism for oxides is not yet thoroughly investigated and is still unclear. However, since glass polishing and oxide polishing are similar, one proposed model for glass and oxide polishing may be applicable [12]. Cook suggests that in glass polishing the surface is first hydrolyzed. The hydroxyl radicals break the O-Si-O bonds and form weaker Si-OH bonds; this weakened surface is then easily abraded by slurry particles. This chemical reaction is limited by the 14 diffusion of hydroxyl ions away from the surface. The process is therefore dependent on parameters such as temperature of the system, slurry flow and slurry composition, to name a few. The weakening of the surface is vital to the polishing process - it has been shown that little removal occurs in the absence of hydroxyl ions [12]. Slurry particles are often not physically harder than the material to be abraded and the pad by itself hardly induces wear. There are also limited hydrodynamic effects of slurry trapped between the pad and the wafer, which is facilitated by the grooves in the pad. However slurry supply is limited and the chemical and mechanical action are more important factors in polishing than the hydrodynamic effect [5]. Pad properties also play a significant role in the polishing mechanism. The viscoelasticity of the pad affects the interaction between the wafer and particles. In the conventional pad, although the particles are present in the slurry, they can be modeled as being embedded in the pad. This is because the pad porosity is larger than particle size and thus particles are constantly trapped between the wafer and pad asperities and then released during CMP as shown in Figure 6 [10]. This is also supported by the fact that the polish rate and the surface roughness of the polished surface are largely independent of the particle size with an upper-limit on the size [10]. Increasing polish pressure WAFER WAFER PAD SLURRY PARTICLES PAD Figure 6: Embedding of particles into pad [10] 15 Slurries used in the oxide CMP process are generally alkaline with a pH of 10 or more. Pads for the conventional process used in industry include the IC1000, Suba IV, and IC1400 pads manufactured by Rodel, Inc. Of these the Suba IV is rarely used by itself. It is a soft pad that is generally used as a subpad with a stiffer pad like the IC1000. The IC1400 is a pre-stacked pad with a stiffer IC1000 surface layer and a softer foam Suba IV underlayer. Elastic properties of these pads are important in determining polish uniformity. While a stiffer pad like the IC1000 gives better die-level uniformity and a smaller TIR, a combination of IC1000 and Suba IV yields better wafer level uniformity. A stiff pad without any stack also generally results in the need for a large edge exclusion. Edge exclusion refers to the outside edge of the wafer where large polishing variations are present, and which is usually excluded from producing a good die. SHORTCOMINGS Although the above CMP process is currently the dominant method in industry for planarization, it suffers from several shortcomings. Firstly, the 2004t Figure 7: New pad (left) vs. used pad (right) [5] 16 total quality, availability and costs associated with CMP consumables are estimated to be 2-7x that of other fab chemicals [3]. To prevent drying of the slurry on the pad, it needs to be constantly wet. As the pad is used, the pores on the surface accumulate pad material and pad porosity varies while processing, affecting both slurry distribution and particle interaction with the wafer surface. This contributes to the drift rate while polishing. Figure 7 shows the difference between a used and a new pad. With use the pad undergoes plastic deformation. The pad therefore becomes smoother and the removal rate is lower, a phenomenon known as glazing. Non-uniform glazing across the pad also leads to non-uniformities in polishing. The pad requires frequent conditioning - the process of scratching the pad surface with a diamond tip to expose a fresh pad surface. However, over time pad conditioning alters the thickness of the pad thus modifying its elastic properties and therefore the polish rate and uniformity. FIXED ABRASIVE CMP Basi c Solution or Di Water Polishing H ea d with Wafer Wder r Coated Abrasive PC F CMP Tool Platen Figure 8: Fixed Abrasive Pad [4] An alternative approach is to use a fixed abrasive polishing pad with a particle-free slurry, as shown in Figure 8. As the name suggests, this 17 process does not involve the use of slurry and inherently simplifies the process. The process is similar to the conventional CMP in that it involves chemically weakening the wafer surface and removing the material by abrasion. The difference in the mechanism lies in that the pad used in this process is a fixed abrasive pad. It consists of a pyramidal or cylindrical resin layer containing abrasive particles placed on a rigid polycarbonate layer and a more elastic sublayer (Figure 8) [4]. An SEM of the pyramidal structure is shown in Figure 9. Figure 9: SEM of the pyramidal surface of a 3M fixed abrasive pad [courtesy 3M] Because the abrasive particles (generally cerium oxide, CeO 2) are embedded in the pad, the process requires only DI-water or basic solutions instead of particle-bearing slurry. Additionally, the distribution of abrasive particles across the wafer may be more uniform than the distribution of abrasive particles in the conventional process. During the polishing process, the removal rate only barely drifts by 10's of A/min during one process run. This is also because the abrasive particles are embedded in the pyramids or cylinders and are continuously exposed during pad wear. Also, the composition of the newer models of fixed abrasive pads is such 18 Figure 10: AFM of a single pyramid structure of the 3M fixed abrasive pad that the polish rate decreases dramatically when planarization is achieved, potentially reducing the concern of over-polishing. Unlike in the conventional pad, there is no need for pad porosity for the trapping and releasing of slurry and particles. During any single process run the surface structure of the fixed abrasive pad does not alter enough to substantially change the removal rate. shows the distribution of abrasive particles on the surface of the pyramidal structures both before and after use. The second picture is blurred due to contact between the AFM head and the pad - a measurement error. However, the white spots are the exposed particles embedded in the pad material and are visible in spite of the blurred image, indicating that a large number of abrasive particles remain exposed even after substantial polishing. Since there is no slurry in the system, the fixed abrasive pad may avoid problems of drying of slurry or glazing of pad and therefore may not require constant wetting [4]. CMP with the fixed abrasive pad therefore 19 wastes less water. On the other hand, abrasive particles released from the resin as well as polish byproducts, may be a concern if allowed to dry on the pad. Also, since the slurry is a KOH solution and does not contain high concentrations of abrasive particles, the waste slurry is simple to dispose of [4]. Some of the concerns with the fixed abrasive pad include scratching, particle and resin wear as well as pad lifetime. Experimental studies comparing the two procedures have shown the fixed abrasive CMP process to have both a higher planarization rate and improved die/wafer level uniformity [3]. This allows the deposition of a thinner layer of pre-CMP oxide to obtain a planarized wafer. Additionally, since less material needs to be removed, polishing time is significantly shorter. Overall the process offers the promise of reducing the cost of planarization. From the comparison of conventional and slurry-free polishing, it is clear that the factors that affect planarization in both methods are similar. We now proceed to examine some key parameters that have been included in the experimental characterization of CMP. KEY FACTORS IN POLISHING Following the description of the process mechanism of both the conventional and the slurry-free process above, we proceed to investigate the macroscopic process parameters that significantly affect planarization. We then look at other factors that are not solely process related but instead also depend on the layout pattern to be polished. While die-level variation is mostly pattern dependent, wafer-level variation is mostly brought about by process conditions such as relative speed, down force, and backpressure. 20 PROCESS PARAMETERS Experiments by Ouma assess the importance of various model factors in oxide CMP [5]. Factor levels were coded to find the value of model coefficients and determine the relative importance of each factor. The factors included were: down force, relative speed and backpressure. Results from Ouma's analysis verified that the relative speed and the down force, in keeping with Preston's equation, are the key factors that determine the removal rate. Although backpressure has no direct effect on the polish rate, it has an impact on wafer level uniformity. As a side note, the subpad also affects wafer level uniformity but not the polish rate. The down force has a significant effect on wafer level variation. This variation is brought about by these three factors: 1. The method by which force is applied on the wafer. This factor is deeply dependent on the machine and the carrier head. There has been significant innovation in head design to achieve either uniform or controllable pressure distributions. 2. Effect on slurry transport. The force applied squeezes the slurry out from between the pad and the wafer. 3. Edge effects. The higher the down force, the larger the edge exclusion. To reduce edge effects, in some tool designs the wafer carrier head is equipped with an active retaining ring. This ring is a separately pressurized or controlled ring to precompress the pad and enable uniform pressure distribution near the edge of the wafer. A related issue is the pressure distribution across the wafer taking into account the initial wafer-level uniformity, e.g. wafer thickness, wafer warp and bow, thickness of thin films across the wafer surface, and uniformity of 21 stress in such thin films across the wafer. Studies on the effect of wafer warp and bow on polish performance have shown that the initial warp can have significant impact [21]. There have also been studies investigating the inherent variation due to Von Mises stress concentrations at the edge of the wafer [22]. This lateral stress build-up around the edge of the wafer is a result of the downward pressure on the wafer. Table speed = 32 rpm Table speed= 16 rpm 1837 36.8- 17.8- cs = carrier speed cs = 14 rpm 36.6- ,~17.6- 64 17.4- cs =25 rpm cs 25 rpm .2- 17.2 . cs=14 rpm 10 20 40 60 80 Radial Distance (mm) 20 0 40 so 8o Radial Distance (mm) Figure 11: Relative velocity depends on carrier and table speeds [5] The other significant factor is the carrier and table rotational speeds. It is important to note that it is not the individual speeds but the relative velocity of the carrier and table that impacts the polish rate and uniformity. Because of the angular rotational motion, carrier and table speed vary across the wafer as shown in Figure 11. But if the system is well synchronized then the relative velocity is constant across the entire wafer surface. This has significant implications for wafer-level uniformity and control of the polishing process. Typically wafer-level variation is relatively small, e.g. 1.2% for a 6" wafer and 2.2% for an 8 inch wafer. The large disparity between removal rates at the edges compared to the center is due to edge effects as mentioned earlier. At high table speeds, the 22 accumulation of slurry at the edge of the pad may also contribute to these edge effects. It is clear that wafer polish occurs side by side with pad wear, however the dependence of removal rate on pad wear is little understood. Points radially across the pad come into contact with different areas of the wafer leading to uneven wear of the pad. In addition to the effects of pad wear, several time-dependent effects of pad break-in and pad recovery, which is the process of soaking the pad overnight, have also been observed [24, 25]. PATTERN LAYOUT Die-level variation is generally the largest percentage component of variation and is dependent on the interaction between layout and process parameters. It strongly depends on the layout pattern - a parameter that is considerably more consumables. By difficult to understanding modify how than process the layout conditions or pattern affects planarization, circuit layout can be developed from the start with these considerations to reduce the difficulty of adding layout modifications later. The layout pattern effect has been investigated experimentally using wafers with various surface topographies [19, 20]. These topographies investigated factors including area, pitch, pattern density and perimeter to area ratio. The area mask consists mostly of lines in blocks the sizes of which vary across the die. The pitch mask consists of lines of equal pattern density but varying line widths. On the other hand, the density mask keeps the pitch constant across the mask and varies density. Finally the perimeter/area mask is used to study structures with the same area but different perimeters. Wafers patterned with these masks were polished for equal times using the same process conditions. Although the conventional 23 CMP process was used for these experiments, the basic principles of how layout pattern affects variation are believed to be very similar. Results showed that the area and density masks yielded the highest dielevel component of variation as well as the largest variations. The effect of pattern density is key in determining the polish rate of patterned wafers. Where the pattern density is low, the region polishes at a higher rate. Because the surface topography generated by area and density masks has the largest variations in effective pattern density, polish rate differs for various blocks in the die and the die-level variation is high. From results of the pitch masks topography, it was clear that the effect of pitch is relatively weak and different pitch patterns polish at comparable rates such that the wafer-scale variation is the most dominant on these wafers. It was also found that the effects of perimeter/area are minimal. Wafer level variation hardly changed between different surface topographies indicating that layout pattern density does not affect wafer level non-uniformity. 24000 feature -- 24000 cn -isolated 22000-f eature dense S20001 -- 19000 - 17800 -0-18000 E16000 - 15"0 -17 B- valley 0 20000 -190M 1E 16000-15000 14000 0 isloated valley dense 1 0 10 20 30 40 50 60 70 80 90 100 tirre (s) 10 20 30 40tir (s60 70 80 90 100 Figure 12: Total remaining thickness at different places in a die for slurry-based CMP (left) and fixed abrasive CMP (right). Features: 500 nm nitride + remaining oxide, valleys: remaining oxide [4] It is also interesting to note how within die variance changes with polish time. Two types of variances can be measured. The first variance includes all points - both up (oxide over patterned features) and down areas (oxide between features). For this type of variation, the largest variance is at the 24 start of polish. This can be seen from Figure 12, due to the initial step of approximately 6000 A. As the polish time increases, the local step height decreases and thus so does the variance. The second type of variance includes only up areas. Initially this variance increases rapidly from a very small value and, contingent upon certain process settings, later falls off slightly as polish time increases. This is because initially low-density regions polish faster but as local planarity is reached, the removal rate is reduced to the blanket rate. This can also be seen from Figure 12 where initially there is no global step height between the dense and isolated features, but as the low-density regions are polished faster, the global step height increases rapidly. This is because high-density regions still contain raised features; these raised areas now polish faster than the planarized low-density areas, thus lowering variation until local planarity is again reached. After long polish times global step height remains constant. This is because local planarity is reached in different density regions creating various global step heights across the die. This change in variation over time is shown for both conventional and fixed abrasive CMP in results from van der Velden in Figure 12 [4]. For the conventional slurry-based process, the removal rate is higher in the lowdensity regions for approximately 40s after which the removal rates for both low and high-density regions are similar. The graph for the fixed abrasive pad, on the other hand, shows that the low-density region (isolated feature) polishes significantly faster than the high-density region but only for 20s. As the low-density region is polished and local planarity is reached the polish rate falls off significantly. In the conventional process, because there the disparity between blanket and pattern polish rates is small, as local planarity is reached in the low-density regions the polish rate drops but not as much as in the fixed abrasive process. Finally after 80s of polish, removal rate is much lower in the fixed abrasive process than 25 in the conventional process. This is also because the wafer is planarized and the blanket polish rate is much lower for the fixed abrasive pad than the conventional pad. The impact of pattern density with fixed abrasive CMP is one of the two key pattern effects we wish to study. The second key pattern effect important in fixed abrasive CMP is the relatively small down area polish that occurs. As can be seen from Figure 12, data for the fixed abrasive pad shows that the oxide thickness in the down areas (valleys) hardly changes until approximately 40s in high-density regions implying that no down area polish occurs. The same does not hold true for regions polished with the conventional pad and low-density regions polished with the fixed abrasive pad. Grillaert's model [8] examined in chapter 3 further investigates the difference between down area polish in high and low-density regions. Understanding the causes of thickness variation is key to accurate process control and modeling of CMP. We now proceed to investigate different models for CMP that have previously been developed and their limitations. 26 CHAPTER 3: MODELING METHODOLOGY A model that can accurately predict thickness variations across the wafer and across each chip has several applications in industry. With this knowledge, layout practices can be improved to reduce thickness variations or identify where dummy filling is needed. Dummy filling can be added in the appropriate areas to minimize variations after the planarization process. The model can be used to determine the optimal film thickness to be deposited for a given specification for planarization. This reduces both polish times and the material wasted. For STI CMP the model can predict when the nitride-capping layer is exposed. Apart from the process related applications mentioned, the model can also be interfaced to analysis tools to determine the impact of dielectric thickness variations on circuit performance. The MIT density model is built on the foundation of Preston's model. In this chapter we examine in detail Preston's model and other models that have been developed for CMP, as well as their limitations. We then investigate the density and step height model that were used to fit results in this project. PRESTON'S MODEL Preston developed a model for glass polishing in 1927. But because it is based on the fundamental mechanical motions in CMP the model is also applicable to dielectric polishing and is widely accepted in industry [11]. Preston's model states that the removal rate, which is the rate of change in thickness, z, is proportional to the pressure, p, on the wafer surface and the relative velocity, v, and can be written as: 27 dz dt t In this equation kp is Preston's coefficient which captures process effects including the weakening of the wafer surface by the slurry, the surface roughness of the polishing pad and the type of abrasive particles used. However, the exact relationship between kp and these process conditions is not clear and it is difficult to tell where all of these dependencies ultimately reside. Preston's model actually is not bound by a mechanism. It is a summary of an empirical observation. Since the density and step-height model are based on Preston's model, they also assume that the removal rate is proportional to the pressure applied. By investigating how the these two models fit the data for the fixed abrasive pad, we also examine whether Preston's model is applicable to the fixed abrasive case, i.e. this work examines whether Preston's equation requires three-body abrasion (conventional polish) or whether it also applies to two body abrasion (fixed abrasive polish). We now proceed to review some of the other models that have been developed for CMP. However, these models are neither physically substantiated nor significantly better than Preston's, and we will proceed with a pattern dependent model based on Preston's equation. OTHER MODELS Cook developed an extension to Preston's model [12]. His model assumes that slurry particles are responsible for polishing and their abrasion of the surface is treated as a Hertzian penetration problem. It is the same as Preston's model except that Preston's coefficient is replaced with 1/2E where E is the Young's modulus of the surface being polished. This model 28 overestimates the polish rate by over an order of magnitude. His explanations for the anomaly are: 1. E for the surface differs from E for the remaining wafer. This is quite possible since the chemical structure of the surface is weakened by the slurry and differs from the bulk of the wafer. 2. Material in the indented volume may not have been removed or was removed and then redeposited. 3. The force applied causes inelastic deformation of the surface to be polished and the equation related to elasticity does not apply. The third reason is an important cause of the discrepancy in the model. The force applied during CMP is relieved by the viscoelasticity of the pad. It enables the particles to be embedded in the pad such that the force does not cause as much penetration of the particles into the wafer surface. Therefore an increase in pressure does not necessarily result in a proportionally increased indentation. Additionally, the concentration of the particles is low and the model cannot accurately evaluate the area of interaction between the particles and the polish surface. Runnels model, on the other hand, focuses on a different aspect of CMP. The model assumes the existence of a hydrodynamic fluid layer between the pad and the wafer [13]. The abrasion of the wafer surface is then dealt with as erosion due to hydrodynamic effects. Slurry thickness is a key parameter in this model and is a function of wafer curvature, slurry viscosity, and rotation speed. The model treats the slurry as a viscous Newtonian fluid while the pad and wafer are treated as rigid surfaces on which non-slip boundary conditions are applied. Preston's equation is modified to include the effects of normal and shear stress. This model is represented by the equation: dz dt 29 where CP is a coefficient similar to Prestons', (- is the normal stress tensor due to the applied force and t is the shear stress due to slurry flow activated by the relative motion between the pad and the wafer. Calculations using principles of fluid mechanics are used to determine these stresses based on the macroscopic flow of slurry as well as the pressure distribution across the wafer. Stokes equations for incompressible Newtonian flow are solved: u-Vu=-±Vp+ p V2u p V-u=O where p is the density, t is the dynamic viscosity, p is the pressure and u is the velocity vector of flow. Although this model has not been tested, it is highly unlikely that it can accurately predict the CMP process since some of the fundamental assumptions in this model are flawed, e.g. the pad is not rigid and the hydrodynamic effect, if any, is very weak. However, important observations are made from this model. Temperature dependence is embedded in the model since it significantly impacts the viscosity of the slurry and therefore the fluid film thickness and the polish rate. Such changes in temperature can vary fluid viscosity by as much as 30%. Since stresses are also dependent on wafer curvature, this is another parameter that affects the model. Tseng and Wang developed a model that is a combination of the Hertzian penetration model and Runnels model [14]. This can be expressed as: AU 5 1 = -MP 6 V 2 At 30 where M is a coefficient determined by the process conditions, P is the pressure and V is the relative velocity. Shi et al. proposed an improved penetration model [15]. Although it is based on particle penetration, it accounts for the viscoelasticity of the pad that relieves the force applied on the wafer by the particles. This model is represented by the following equation: AH 2 H=-K P 3V At All of the above models incorporate mechanical aspects of CMP and only to a very small extent the chemical effects of the slurry. Additionally these models are aimed at blanket wafer polishing. Since then effective and practical models have been developed that enable solving for the waferlevel polish rate across the entire wafer as a function of measurable macroscopic process conditions. DENSITY MODEL A pattern density based model can be used to understand the interaction between the CMP process and layout dependencies on the wafer being polished. An efficient characterization methodology that includes standard characterization masks as well as model calibration and verification procedures has been developed [19, 5]. The basic model and terms used in the pattern-dependent model were originally proposed by Stine et al. [20]. The definition of these terms is shown in Figure 13. The equation proposed by Preston is modified to include the blanket rate, K, and the effective pattern density, p(x,y,z), which is defined as the effective or perceived density at a particular point. The equation is written as follows: 31 K dz _ dt p(x,y,z) The equation is then solved for z under the assumption that down areas are not polished until the local step height, z1 , is removed. The polish rate is then the blanket polish rate. As will be examined in the step-height model, this is an assumption that can be improved on. The effective density is written as: p(xyz)= fpo(x,y) z>zO-z, The final film thickness for any time t is then given by: zo Z t < pOz 1 I K - zo - z 1 -Kt+-p 0 (x,y)z1 Z up areas t> p0 z1 1K down areas > -2 ZO SOxide z < z0-zj Metal Figure 13: Definition of terms used in the model The accuracy of this model depends heavily on the correct evaluation of po(x,y). The effective local pattern density is calculated for specific areas of the mask by employing a planarization length sized weighting function, which takes into account the pad and process interaction with neighboring features. 32 Factors such as process conditions, machine type, and complicated coupling effects of the pad, which is the most dominant effect, are incorporated into a single parameter, planarization length. A stiffer pad resists deformation and therefore distributes pressure over a larger area. The planarization length of this pad is therefore larger. This term is effectively the length scale over which the surrounding areas affect the local pressure at any given location on the wafer. In other words, it is the length over which the effective density for any given point should be calculated. The effective density can be calculated for an arbitrary layout, across the mask given the planarization length and the weighting function. Planarization length defines the size of the area over which the effective density is calculated while the weighting function describes the shape of this area. The density weighting function is in effect the planarization impulse response of the pad and process. Similarly, the planarization length can be considered as the number that characterizes the impulse response function. In the simplest case the weighting function can be a square window. For the slurry-based process, the elliptical weighting function presents the most optimal performance among different weighting functions that were considered by Ouma [5]. Figure 14 shows examples of alternate weighting functions that have spatial symmetry. The important difference between the weighting functions is how each one weighs nearby local features differently than farther away structures. 33 Cylinder Square 1 1 t. 0 Elliptic Gaussian 1 0 1, 0- Figure 14: Effective density weighting functions [5] The effective density across the die is calculated using the following steps: (a) layout biasing, (b) evaluation of the local discretized density and (c) effective density calculation. Layout biasing refers to the adjustment of the drawn mask layout to more closely match the actual deposited film. The local discretized density function is calculated by first dividing the mask (Figure 15) into small square cells in a regular grid. The density in each cell is then the ratio of raised to total area of the cell. Finally, the effective density (Figure 15 - Right) given by the convolution sum of the discretized density function and the weighting function. The weighting function is effectively a moving average window that is applied to the layout. Fy 122 ad 00 Figue mak15Layot ad efectie desit 34 STEP-HEIGHT MODEL The density model described above provides thickness predictions to a first order and falls short when predicting low-density features. Several alternate models have been proposed but these lack a clear connection to pattern density. One such model proposed by Burke [16] is an exponential decrease in step height with time. This model assumes that the pad is always in contact with both the raised and down areas and that the removal rate is determined by the distribution of pressure between the raised and down areas. Results from experiments by Grillaert et al. [8] at IMEC showed that the above assumption is true only after a certain step height is reached. Before this step height is reached, the model predicts that there is no contact between the pad and down areas. The removal rate of the raised areas is then given by the blanket rate divided by density (Figure 16). During this phase, step height decreases linearly with time and the thickness variations are modeled well by the density model. After reaching the transition step height, the rate is exponential as per the model in [8]. At this stage, down areas are also polished. Smith's work combines the step height and the density model [9]. The difference between the IMEC model and the density model is shown in Figure 16. (BR)/p Er MIT 2h BR+(1-p)-e % C.M o 20 40 £IMEC 60 80 100 60 40 20 0 so 100 go 100 6000 4000-- BR-p-e 2000- - "W- IMEC cc 2000 2001 0 0 20 40 60 Polish Time (Seconds) s0 MIT ,,' 100 0 0 S 20 2 60 40 Time (Seconds) Polish Figure 16: MIT density] model vs. IMEC model [9] 35 Results from Smith show a 50% reduction in fitting errors of both raised and down area thickness by using the combined density and step height model versus the density model. Variations of this model significantly reduce the number of model parameters and provide the ability to predict post polish thickness for arbitrary layouts. In this chapter we examined various CMP models. The following chapter examines how these models fit data from the various CMP processes. We investigate which model, i.e. the density or the step-height model, best explains the fixed abrasive pad. The implications of which model fits best will also be studied. Finally, we also examine if there are other effects, e.g. a different density dependence, that require modifications to the current model. 36 CHAPTER 4: EXPERIMENTAL WORK In this chapter we describe the experimental methodology, used to study the fixed abrasive CMP process including the design of experiments, characterization masks, and the pre and post CMP measurement procedures. We then present the results obtained and analyze them to examine the difference between the conventional and fixed abrasive process. EXPERIMENTAL DESIGN The first experiment was conducted to investigate the polish characteristics of the conventional CMP process over time. The wafers used were first coated with 2jim of oxide in the Novellus Concepti CVD deposition tool. Some of these wafers were left as such to use as blanket wafers to monitor the drift in the polish rate during the course of the experiment. The rest of the wafers were then patterned with the MIT mask shown in Figure 17. The mask makes dies of 1cm 2 with each density block being 2.5X2.5mm. It has four rows and the same number of columns. Rows 1 and 2 have gradually varying densities. Row 3 consists of blocks of densities that vary in steps and row 4 has gradually varying pitch structures that maintain a constant density of 50%. Oxide thickness on these wafers was measured using the UV1250 before CMP. In each density and pitch block, six points were measured (three peaks and three valleys). This totals to 96 points per die. One central die per wafer was measured. Coordinates for both pre and post CMP measurements were kept constant. 37 Row 4 Row 3 Row 2 Row 1 P20 P100 P250 P500 (20/20) (100/100stm) (62/250ptm) (500/500ptm) 20% 80% 40% 60% (20/80ptm) (80/20pm) (40/60ptm) (60/40pm) 37% 62% 87% 100% (37/63ptm) (62/38 tm) (87/13pm) (100/Opm) 12% 25% 50% 75% (12/88gm) (25/75pm) (50/50gm) (75/25pm) Figure 17: Mask used for experiments and corresponding pattern density In order to study how thickness variations across the wafer alter over time, the wafers were polished in steps, i.e. each wafer was polished to remove a certain percentage of the oxide. Oxide removed from each wafer was 20%, 40%, 60%, 80% or 100%, where 100% corresponds to complete removal of the deposited film on a "blanket" or unpatterned film. One wafer was also over-polished to remove 120%. The pad used for this process was the IC1000, Suba IV combination and the slurry used was the Cabot (SS-12). The pad was first broken in with dummy wafers before processing the actual lot. If the wafers were polished starting with 20% followed by 40% until 120%, the effects of pad wear could be confused with the effects of time on thickness variations. In order to prevent this, the order of polishing was randomized. To investigate how repeatable a polish is, some wafers were duplicated. In this process lot, the 40% and 80% wafers were repeated. To monitor the drift in the polish rate over time, blanket wafers were polished for 60s to check the rate. Based on the new rate the polish time for the next wafer was calculated. 38 After CMP, the wafers were cleaned in the SSEC Evergreen and then cleaned again in a piranha dip. The amount of oxide removed in this process is negligible. The HF dip following the piranha was omitted since this would have removed the oxide. After the cleaning procedure, the wafers were measured again using the UV1250. The second experiment with the slurry-free process was almost identical. The pad used is a proprietary fixed abrasive pad provided by 3M. The formulation of the pad is 0.090 inch closed-cell foam with 0.020 inch polycarbonate. Instead of slurry, a KOH solution of pH 11.5 was used. While other models of the fixed abrasive pad have an extremely small blanket rate, the fixed abrasive pad used in these experiments had a blanket rate that was very close to the polish rate for patterned wafers. RESULTS In discussing the results, there is some terminology that should be understood. A certain percent polish (e.g. 40% polish) refers to the percentage of the step height that was intended to be removed during CMP (i.e. that wafer was polished to remove 40% of the blanket oxide thickness). The amount of polish could also be referred to with the polish time (e.g. 75s polish). However, the polish time for a 40% polish using a fixed abrasive pad (FAP) is much smaller than when using the conventional pad. This is because the rate of polish for the fixed abrasive pad was generally larger than with a conventional pad (e.g. 1364 ptm/min vs. 3164 gm/min). In order to compare two wafers it is therefore best to refer to the % polish. A certain percentage density (e.g. 12%) refers to the designed or layout pattern density, which is dependent on the position in the die as shown in Figure 17. In this figure the (0,0) corner is the 12% density and the (10,10) corner is the P500 pitch block. 39 3D plot of measured thickness variation (20% 3D plot Conventions( pad) 6 of measured thickness variation (40% Conventional pad) 1.6-4 1--.4 14 - 10 x X 10 1.2 1.2 10 6 Lenogth Longoh((mm) mm11 ) 0 0 oLength 0 Figure 18: Evolution of Im.)t Lenth[~(m .--..-- 6 Length Lenh ( mm11 ) o 0 oo Length Lont h(m.)t (mmoo up areas during CMP with the conventional pad: (Top Left) 20% polish, (Top Right) 40% polish, (Bottom Left) 60% polish, (Bottom Right) 80% polish. From Figure 18 the evolution of the die surface polished using a conventional pad can be observed. Plotting the data from CMP using the fixed abrasive pad yields graphs that are very similar to Figure 18. When the wafer is only polished 20%, the die looks relatively uniform. As the wafer is polished further, uniformity first increases and then decreases. This is because as the wafer is polished, low-density regions are polished faster creating step heights. Later local planarity is achieved in these regions and high-density regions begin to polish faster. Thus after long polish times, global step-heights are reduced and finally remain constant. The above explanation for the variance is applicable for up areas. We now proceed to investigate the evolution of down area polish. 40 Figure 19 shows the evolution of down areas during CMP using the fixed abrasive pad. As can be seen, there is minimal removal of down areas when polishing with the fixed abrasive pad. Only for long polish times, the removal of down areas can be observed in low-density regions. The fact that little removal occurs in valley regions is one of the key features of the fixed abrasive pad. On the other hand, it is clear from Figure 20 that down areas are polished even for short polish times using the conventional pad. The implications of this behavior will be discussed later in this chapter. 3D plot of measured downereas 3D plot of measured downareas (20% Fixed Abrasive pad) 10500 , 10500 10000 , 10000, 9500 , 9500 (40% Fixed Abrasive pad) 9000 8000, 7500 7500 10 0 Length (rmm) 0 Length Length (mm) 3D plot of 10000 measured 0 (mnm) 0 Length (mm) 3D plot of measured downareas (80% Fixed Abssive pad) downareas (60% Fixed Abrasive pad) 10000 , - 9500, 9500 , 9000 10 0000 , 8500 , 8500 , 8000 , 7500 750 L a 10 -10 4 2 Length f.m) 0 0 Length (mim) Length (mm) Figure 19: Evolution of down area polish during CMP 2 0 0 Length (mm) using the fixed abrasive pad: Top Left - 20% polish, Top Right - 40% polish, Bottom Left - 60% polish, Bottom Right - 80% polish. 41 of measured 3D plot 10500 downareas (20% Conventional pad) 3D of measured downareas (40% Conventional pad) 10500 , 9500,s 9500-, 9000 9000 s - 8500 , 8500 - 6000 s 80000-1 7500 plot y 7500 6 6 2 2 0 Length (mmn) 0 0 Length (mmi) Length (mm) 0 Length (m.) Figure 20: Evolution of down area polish iduring CMP using the conventional pad: Left - 20% polish, Right - 40 % polish. To validate the results discussed throughout this chapter, it is important to see the repeatability of the data collected. Figure 21 shows in 3D the difference between two sets of data collected from two different wafers polished for the same length of time (40% polish). This difference is very small compared to the oxide thickness (an average difference of 200A). As can be seen, better repeatability is achieved using the fixed abrasive pad than the conventional pad. 3D plot of the difference in replication (40% Fixed Abrasive pad) 3D plot 12000 12000 10000 100DO 6000 8000 6000- 4000 4000 2000,1 20001 8000 of the difference in replication (40% Conventional pad) 6 10 10 4 2-- 6111 Length0 0 0 Legth (.111 Lenigth (mrm) 0 Length (mm) Figure 21: Repeatability of data for 40% polish using a fixed abrasive pad (left) and a conventional pad (right) 42 The data obtained from the UV1250 measurements was used to extract the planarization length using the density model. The planarization length and the removal rate, which is known from the CMP procedure, were used in the MIT density model to predict thickness variations after CMP. The measured and the predicted data are plotted in 3D to reveal the surface characteristic shown below in Figure 22. The measured and predicted surfaces look similar in these graphs. The counterparts of these figures for polish with the conventional pad also look alike. 3D plot of measured thickness variation 3D plot of predicted thickness variation (80% Fixed Abrasive pad) (80% Fixed Abrasive pad) 10 .10 Lngh [mm o 6 1- 17 Length (mm) Longth (mm) Longth(mm) Figure 22: (a) Measured data after 80% polish for the fixed abrasive pad, (b) Predicted data from the density model It is interesting to examine how the residual difference between the measured and predicted surface evolve over time. This residual is plotted in 3D graphs for wafers of 40%, 60% and 80% polish to compare the residual for the conventional and the fixed abrasive pad (Figure 23). For the fixed abrasive pad, the residual smoothens over increasing polish times. The conventional pad on the other hand, does not show such a trend, indicating that the pure density model misses important aspects of the polish throughout the conventional polish. 43 Difference between measured and predicted values (40% polish using FAP) Difference between measured and predicted values (40% polish using Conventional pad) - 4000 4000 3000 - 3000 2000 2000 00 -1000 -1000 -2000, 10 -2000-. 10 <- ~~>10 6 10 6 4 0 h 2 0 Length Difference between measured and predicted values (60% 0 Length (mm) (mm) 0 Length (mm) Difference between measured and predicted values (60% polish polish using FAP) 4000 , 4000 3000 , 3000 2000, 2000, 0 0 using Conventional pad) -1000- -1000 -2000 10 \ \ A_- -2000 10 ) _ - 10 10 4 2 0 Difference between measured 0 0 Length (mm0) and predicted values (80% polish using FAP) 6000 , Difference between measured 4000 ,- 3000 , and 0 Lng predicted values (80% polish mm using Conventional pad) 2000 -. 2000 1000 - 0, -2000-. -1000 ,A -4000 -1 10 - 2000 10 6 4 4 2 Lenogth (mim) 0 Length (mm) Length (mm 0 Length (mm) Figure 23: Difference between measured and predicted values. Left top to bottom: 40%, 60% & 80% polish using fixed abrasive pad. Right top to bottom: 40%, 60% & 80% polish using conventional pad. 44 To see the thickness variations in different sections of the die, Figure 24 shows a comparison of the data from the conventional and the fixed abrasive pad. The measured data is fitted with the predicted data and the fit is good for both sets of data. From the change in thickness variation from the 40% data to the 80% data, it can be clearly seen that the lower density regions are polished at a higher rate. This is seen from a comparison of Row 3 in Figure 24 a and b in the graph for 80% polish, where more material is removed from the regions corresponding to 40% and 20% density. While the 100% density region is relatively locally planar early in the CMP process, the graph of 80% polish shows that the 100% density has a higher thickness at one edge of the region than the other. The border of the 100% density block of thicker oxide is adjacent to the 87% density block while the thinner border is adjacent to the 37% density block of the next die. This shows the effect of the pattern density of adjacent features. It can also be seen from the graphs that results from the fixed abrasive pad are less uniform than the results from the conventional pad. This effect is related to the planarization length, which will be discussed in further detail. Although the effects of varying pitch are minimal, a close observation shows that the blocks of smaller pitch are polished slightly faster. The difference however is minimal, indicating that density has a much larger effect than pitch on thickness variations. A plot of the measured data for the fixed abrasive pad versus the stepheight model looks very similar to Figure 24. But the accuracy of the step height model is better than the density model as will be discussed later. 45 for FAP) Thickness (kA), Model(-) vs. Measured("o" for Conventional pad and 0 - S10 59 8 7 6 5 4 3 2 1 0 10 20 ~15 ~0 ~ & 2 1 0 1 I .. . 20 ........ ... . 3 4 5 6 1 1 1 1 ..... .. 7 8 1 I 7 8 9 10 15 N 0 .. .. ..... 5 2 1 0 4 3 5 6 9 10 9 10 - 20 0..~ . -4 15 0 0 5 1 0 2 3 4 5 6 7 8 X Coord (mm) for FAP) Thickness (kA), Model(-) vs. Measured( "o" for Conventional pad and 0 '15 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 -02- 3 4 5 6 7 8 9 10 9 10 0 N J15 - 10I 20 .. 0 j t-77 10 50 0 1 2 3 4 5 6 7 8 X Coord (mm) Figure 24: Predicted vs. measured values for conventional and fixed abrasive pad. (Top) 40% Polish. (Bottom) 80% polish 46 STEP-HEIGHT A graphs of step height vs. pattern density for the fixed abrasive and the conventional pad show that step height is greatly dependent on the pattern density and varies substantially across the die (Figure 25). Note that pattern density refers to the local layout pattern density and not the effective pattern density as averaged using the planarization length. This is responsible for the discontinuities observed in Figure 25 - there are a range of step heights observed for the many step-height measurements in the 50% density areas. Blocks designed with the same layout density may have very different effective densities die to proximity of nearby structures. Step height versus pattern density for the conventional pad 9000 8000 I - 7000 6000 * 5000 C 4000 o 3000 -60% 20% polish 40% polish polish 80% polish 100% polish 2000 1000 0 -1000 L_ 10 L_ 20 LLL 30 40 50 60 % Layout Pattern Density 70 80 90 100 Figure 25: Step height versus pattern density for the conventional pad As per expectations, the step-height vs. pattern density shows a linear curve for both the conventional process in Figure 25 and the fixed abrasive process in Figure 26, where the step height is approximately proportional to the pattern density. This is because as features in low-density regions polish faster than those in higher density regions. The curve for step-height 47 vs. pattern density is steeper for the fixed abrasive than the conventional pad indicating that density has a greater impact on step-height when polishing with the fixed abrasive. Step height versus pattern density for the fixed abrasive pad 10000 - ++ 8000 6000 + - I 4000 an 2000 * 20% 40% 60% 80% polish polish polish polish 0 I-2000 1c 0 20 30 40 50 60 % Layout Pattern Density 70 80 90 100 Figure 26: Step height versus pattern density for the fixed abrasive pad The step height plotted versus time presents results that one would have expected. It has been shown that with time step height initially decreases linearly and then exponentially. It was also expected that for the fixed abrasive pad, the curve would be linear for the most part. The linear and the exponential portion of the curves can be explained by Grillaert's model [8]. During the polishing process using a pad with a stack, when the step height is large enough the pad does not contact the valleys although the pad can deform. The pad can therefore be treated as an incompressible pad and the step height decreases linearly with time, the rate of decrease being proportional to the removal rate and inversely proportional to the pattern density. However, when the transition step 48 height is reached, the pad comes in contact with the valleys and the scenario is then modeled as a compressible pad. This requires solving a differential equation with an exponential solution. Beyond the transition step height, the curve is therefore exponential. Figure 27: Step-height versus time: (Top) Conventional pad, (Bottom) Fixed abrasive pad. 12000 - Step height versus time for the conventional pad - *-10000 12% density 37% density 60% density 75% density 100% density 8000 V 6000 ('z 4000 - 2000 .... .. .. ... ... .. .. .. .~. - -2000 50 0 300 250 200 150 100 400 350 45 0 Step height versus time for the fixed abrasive pad 12000 * 10000 12% density 37% density 60% density 75% density 100% density 8000 F 6000 W _r CL V3 4000 - 0 _j 2000 - -2000 - 0 20 40 60 80 Time (s) 100 120 140 160 49 The fact that the step height vs. time curve is linear for polish using a fixed abrasive pad indicates that the pad can be treated as incompressible to yield reasonably accurate results using the density model. However, since similar data for the case of the conventional pad shows a more significant exponential tendency, the step-height model should be used for better prediction of thickness variations. Further work is required in this regard to see the difference in prediction for the conventional and fixed abrasive pad using the step-height model. In particular, long polished to explore the contact or transition height in more detail are needed. Figure 27 shows that the curve for step-height vs. time, for both conventional and fixed abrasive data, appears to be more exponential as the pattern density decreases. This is because as the density decreases, features are polished faster. Thus the transition step-height is reached earlier and the pad comes in contact with the down areas. As per Grillaert's model, the step-height then decreases exponentially with time. In the case of the conventional pad, the exponential trend is more obvious. This is because down area polish occurs much more in the conventional process than in the fixed abrasive process. EXTRACTED CHARACTERISTICS Planarization length A longer planarization length implies that local planarity can be achieved over a longer length scale. As shown in Figure 28, the planarization length extracted using the density model is higher for the conventional pad than for the fixed abrasive pad. This data confirms the observations in Figure 24 where there is a slightly greater variation in thickness across rows in wafers polished with the fixed abrasive pad than the conventional pad. 50 While the planarization length is the same for wafers polished 60% and 80%, there is a discrepancy between the planarization lengths for the fixed abrasive pad and the convention pad in the 40% polished wafers. The implications of a time-dependent planarization length needs further consideration in the future. Planarization length vs. percent polish 6000 5000 4000 - SE 3000 -*- Fixed abrasive pad Conventional pad 2000 1000 0 40% 60% 80% % Polish Figure 28: A comparison of the extracted planarization lengths Removal rate An interesting aspect of the model is that although the model fits the data well, there is a large difference between the actual removal rate measured during the CMP process and the removal rate extracted by the density model. When the measured removal rate is used in the density model to predict thickness variations, the difference between the measured and predicted data is large. On the other hand when the density model uses the removal rate determined through multiple iterations of calculating the planarization length using the initial removal rate and vice versa, it accurately predicts the thickness variations. A comparison of the removal rate extracted in this manner and the measured removal rate for the fixed abrasive pad is shown in Figure 29. The model underestimates the rate by approximately 2X. For the conventional pad however, it overestimates the 51 polish rate for the conventional pad by almost 2X. The reasons for this discrepancy is unclear and needs to be investigated further. Figure 29: A comparison of the removal rates using the density model Measured vs. predicted removal rate: FAP 3500 3000 2500 2000 1500 1000 E 500 Cr --- Rate (actual) Rate (model) 0 0~ 0% 20% 40% 60% 80% 100% % Polish Measured vs. predicted removal rate: conventional pad 3000 E (D -- - - - - - - - - - - - - - - 2500 2000 -4-Rate (actual) -Rate (model) S1500 0 E Ir 1000 500 0 % 20% 40% 60% 80% 100% % Polish We now proceed to investigate the prediction errors using the various models. This is done by measuring the root mean squared (RMS) error between the measured and predicted values for different rows of the die. By measuring the error for various rows we can examine whether the model fits the measured data better for certain types of layout. Additionally, the RMS error for different amounts of polish has also been calculated to see the effects of the amount of polish on the RMS error. 52 MODEL FIT: DENSITY MODEL We first consider fits using the density CMP model. For the conventional pad, Row 1, which has the lowest average density on the die, has the largest RMS error (Figure 30). This is in accordance with Smith's findings that density-based predictions for raised areas in the 12% and 25% density regions are poor [9]. Row 2 and 3 have better prediction while Row 4 has a poor prediction rate again. Perhaps the effect of fine pitch causes faster polish and prediction is only slightly better than for the low-density regions in Row 1. As per Figure 24, the model fit for Row 4 shows that prediction is better for regions of larger pitch structures. Further studies are required to examine the effect of pitch. Accurate prediction of thickness depends on contact between the pad and raised or down areas. As the pad polishes raised areas in low-density regions, the material is removed quickly and the pad comes in contact with the down areas. Pressure is then distributed both over the raised as well as down areas. Prediction for low-density regions is therefore flawed. This can be seen in higher density regions, the pad does not come into contact with the down areas until most material is removed. This is perhaps why the RMS error usually increases with polish time in the graphs below. The RMS errors for Rows 1-3 are very similar for the fixed abrasive and the conventional pad. The main difference between the two graphs is the larger error in thickness predictions for pitch structures (Row 4). The density model predicts approximately the same results across Row 4. But perhaps this indicates that accounting for the contact height when modeling the conventional pad can yield better prediction. 53 Figure 30: RMS error for the modeled data RMS error for conventional pad 16001400 1200 0 (n - 1000 - 800 - - 800 1 600 600 - - Rowi1 E040% polish R 111160% polish3 E60 polish olis 0 80% polish -080% Row 2 Row 3 Row 4 RMS error for fixed abrasive pad 1600 1400 1200 0 (n, 040% polish *160% polish 080% polish 1000 800 600 400 200 0 Rowi1 Row 2 Row 3 Row 4 MODEL FIT: STEP-HEIGHT MODEL Next we consider fits using the combined density and step-height CMP model. Plotting the measured fixed abrasive pad data vs. predicted thickness using the step height yields a graph very similar to Figure 24. The RMS error for the difference between the step height model and the measured data for the fixed abrasive pads is shown in Figure 31. As can be seen, for the low-density regions (Row 1) the accuracy of prediction is considerably higher than using the density model. This can be explained as 54 follows. As mentioned earlier, the low-density regions polish faster - when the transition step-height or contact height is reached, the pad comes in contact with the down areas of the wafer. Beyond this transition step height, the pad cannot be treated as incompressible. Since this stage is reached relatively early during the polish process for low-density regions, the step-height model predicts thickness variations with greater accuracy by eliminating the assumption of no down area contact, before the stepheight is removed, in the pure density model. In high-density regions, on the other hand, the transition step-height is reached much later in time and the pad can for the most part be treated as incompressible. Therefore the step-height model shows enhanced prediction for the low-density regions (Row 1) and approximately the same level of accuracy for the density structures. The description above is also supported by Figure 19 and Figure 20, which show down area polish in the case of conventional and fixed abrasive CMP. In the case of the fixed abrasive pad, down areas are not polished except in low-density regions. Since this supports the density model where no down area polish is assumed, the density model effectively predicts thickness variations after the fixed abrasive CMP process. Figure 31: Prediction error using various models on fixed abrasive data RMS error: density vs. step-height model for 80% FAP polish 2000 0 1500-- 0lDensity model Stepht. model 1000 500 Row 1 Row 2 Row 3 Row 4 55 SUMMARY OF RESULTS In this chapter we have analyzed data from experiments involving polishing wafers patterned with a density mask, using the conventional or the fixed abrasive CMP process. Results confirm that hardly any down area polish occurs in the case of the fixed abrasive polish except in low-density regions. Therefore, as can be expected, the step-height versus time graph shows an exponential curve for very low-density regions and a linear curve for regions of higher density. The pad can be treated as incompressible and thus be modeled to a reasonable degree of accuracy using the density model for these regions of higher density. The step-height model provides better results when modeling thickness variations in low-density regions. 56 CHAPTER 5: CONCLUSIONS In this study the performance of the fixed abrasive pad was examined for oxide polishing through a series of experiments. A density mask was used in the analysis of the behavior of a 3M fixed abrasive and a Rodel conventional pad. The study examines the prediction of thickness variations after CMP with a fixed abrasive pad using the MIT density and the step height model. The study begins with a close examination of the fixed abrasive and the conventional polish process and an analysis of the shortcomings of various methods. While the conventional process requires considerable maintenance of the machine and pad, the slurry free process is simpler. The planarization length is deeply dependent on the pad and the process. In chapter 3 we examine in detail several possible models for CMP and the assumptions on which they are based and their shortcomings. Some of the models described include Preston's model, the MIT density model, and the step height model. We concluded that the MIT density model gives a good first order approximation for thickness variations and that the step-height model combined with the density model further enhances this model. Chapter 4 describes the experiments conducted using the fixed abrasive and the conventional pad. A comparison of results from the different experiments was then presented. Measured results from the fixed abrasive and the conventional pad experiments were fitted against the MIT density model and compared. Some aspects investigated in this work include the 57 effect of layout pattern density, variation in step-height, and the error between the predicted and the measured data. From the study, we have seen that the MIT density model predicts thickness variations to approximately the same degree of accuracy for both the fixed abrasive and the conventional pad. As can be predicted form the global step heights after CMP using the fixed abrasive pad, the planarization length for the fixed abrasive pad (4.3 mm) was indeed lower than for the conventional pad (4.7 mm). Little down area polish occurs when polished with the fixed abrasive pad, except in areas of low-density. The step height vs. polish time for the fixed abrasive pad is therefore linear for the majority of polish time in areas of higher density. In modeling thickness variations in areas of higher density the pad can be treated as incompressible to yield good results. The density model is therefore sufficiently accurate in this case. In areas of low-density however, the density model does not accurately predict thickness variations and contact height needs to be taken into account for better modeling. The step height model shows improved accuracy over the density model for areas of low density. FUTURE WORK In this work, various aspects of the fixed abrasive pad have been investigated. It has been shown that thickness variations after polish using the fixed abrasive pad fit the MIT density model as well as the step-height model. There are a several issues that require further investigation to understand the fixed abrasive pad better: 58 1. Pad properties Newer 3M fixed abrasive pad models have a significantly smaller removal rate for blanket wafers than patterned wafers. This property does not pertain to the fixed abrasive pad used in this study since it is an older model. Future studies should investigate the accuracy of modeling the newer pads using the density and step-height models. The experiments in this study focused on fixed abrasive pads with a pyramidal surface structure. It would be interesting to investigate how the surface structure of the pad affects polish properties. Fixed abrasive pads with a cylindrical surface structure should be one aspect of future studies. Secondly, the pads used had a softer subpad. The role of the subpad and its effect on uniformity is discussed in [4]. However, a fixed abrasive pad without a stack (e.g. a pad directly affixed to a hard platen) has not been fitted against the density or step height model. The nature of the interaction between the abrasive particles and the wafer also requires understanding. It is sometimes conjectured that the exposed particles embedded in the surface of the fixed abrasive pad disengage from the resin and behave like abrasive particles in the conventional slurry. There is however no conclusive evidence of this and further investigation is required. 2. STI CMP and modeling The effect of the fixed abrasive pad for planarization in STI structure needs to be investigated further. Results from Vo et al. from Rodel show improved results using the fixed abrasive matrix on STI structures compared to conventional ILD slurry and high-selectivity slurry processes' [17]. Studies High selectivity to topography and 1:1 selectivity to nitride is required for this level of performance. 1 59 to determine how measurements from the STI experiments fit the dual material model such as the copper model developed by Tugbawa [23], are suggested. 3. Pattern layout It is not clear from the study whether the effect of pattern density is the same with the use of the fixed abrasive and the conventional pad. Further studies on this topic are suggested. The RMS error was calculated for the difference between the measured and predicted values for the two types of pads. As discussed in the previous chapter, the RMS error for Row 4 containing pitch variations was larger for the fixed abrasive pad than the conventional pad. The prediction error for variations in density however are very similar and suggest that the gradual and step density structures have the same effect on polishing using the two pads. 4. Modeling artifacts An interesting point to note from the modeling is the difference between the measured and extracted "blanket" removal rates for the conventional and slurry-free process. A key question is how the equivalent "blanket" rate for patterned wafers should relate to the observed removal rate on true blanket wafers, as this research shows these values to be very different. While the measured "blanket" removal rate was considerably higher for the fixed abrasive pad, the model extracts a much lower rate. On the other hand, for the conventional pad, the model extracts a higher blanket removal rate than observed on unpatterned wafers. In fact the model wrongly indicates that the blanket removal rate for the conventional pad is higher than that for the fixed abrasive pad. It is not clear how the extracted "blanket" removal rate is related to the physical process parameters and conditions and further studies are required to examine if more can be learned about the process from the extracted removal rate. 60 REFERENCES [1] National Technology Roadmap for Semiconductors: Semiconductor Industry Association Report, SEMATECH Inc., Austin, TX, 1997. [2] K. Perry, Chemical Mechanical Polishing: The impact of a new technology on an industry, Symposium on VLSI Technology Digest of Technical Papers, 1998. [3] S. Wolf, Silicon processing for the VLSI era: Vol. 2 - Process Integration, Lattice Press, Sunset Beach, CA, Chap. 4, 1990. [4] P. van der Velden, Chemical-mechanical polishing with fixed abrasives using different subpads to optimize wafer uniformity, Semicon Europa, 1999. [5] D. Ouma, Modeling of chemical mechanical polishing for dielectric planarization, Ph.D. Thesis, MIT, Nov. 1998. [6] P. van der Velden and F. Weimar, Chemical-mechanical polishing with fixed abrasives, Semicon Europa, pp. 5-9, 1998. [7] D. Ouma, D. Boning and J. Chung, An integrated Characterization and Modeling Methodology for CMP Dielectric Planarization, International Interconnect Technology Conference, San Francisco, CA, June 1998. [8] J. Grillaert, A study of the planarization process during CMP for oxides and STI, Ph.D thesis, Katholieke Universiteit Leuven, May 1999. [9] T. Smith, Device independent process control of dielectric CMP, Ph.D. Thesis, MIT, Sept. 1999. [10] N. B. Kirk and J.V. Wood, Glass polishing, British Ceramic Trans., vol. 93, no. 1, pp. 25-30, 1994. [11] F. W. Preston, The theory and design of plate glass polishing machines, J. Soc. Glass Technol., vol. 11, pp. 214-256, 1927. [12] L. M. Cook, Chemical processes in glass polishing, J. Non-crystalline Solids, vol. 120, pp. 152-171, 1990. [13] S. R. Runnels and L. M. Eyman, Tribology analysis of CMP, J. Electrochem. Soc., vol. 141, no. 6, pp. 1698-1701, June 1994. [14] W-T. Tseng and Y-L. Wang, Re-examination of pressure and speed dependencies of removal rate during CMP processes, J. Electrochem. Soc., vol. 144, no 2, pp. 14, Feb 1997. 61 [15] F.G. Shi, B. Zhao and S-Q. Wang, A new theory for CMP with soft pads, international interconnect Technology Conference, San Francisco, CA, pp. 73-75, June 1998. [16] P.A. Burke, Semi-empirical modeling of Si02 CMPplanarization, in Proc. VMIC Conf., Santa Clara, CA, pp. 379-384, June 1991. [17] T. Vo, T. Buley and J. Gagliardi, Improved planarization for STI with fixed abrasive technology, Solid State Technology, pp. 123-128, June 2000. [18] J. Gagliardi, 3M Fixed abrasive CMP polishing of semiconductor oxide films, 3M product literature, Sept. 1999. [19] B. Stine, D. Ouma, R. Divecha, D. Boning and J. Chung, Rapid characterization and modeling of pattern dependent variation in CMP, IEEE Trans. Semi. Manuf., vol. 8, no. 4, pp. 382-389, 1995. [20] B. Stine, D. Ouma, R. Divecha, D. Boning, J. Chung, D. Hetherington, I. Ali, G. Shinn, J. Clark, O.S. Nakagawa and S-Y. Oh, A closed form analytic model for ILD thickness variation in CMP processes, Proc. CMPMIC Conf., Santa Clara, CA, 1997. [21] A. R. Baker, The origin of edge effect in CMP, Proc. of Electrochem. Society Meeting, Vol. 96, no. 22, pp. 228, 1996. [22] D. Wang, J. Lee, K. Holland, T. Bibby. S. Beaudoin and T. Cale, Von Mises stress in CMP processes, J. Electrochem. Soc., vol. 144, no. 3, pp. 1121-1127, March 1997. [23] T. Tugbawa, T. Park, and D. Boning, Framework for Modeling of Pattern Dependencies in Multi-Step Cu CMP Processes, SEM ICON West 2000, July 2000. [24] K. Achuthan, J. Curry, M. Lacy, D. Campwell and S. V. Babu, Journal of Electronic Materials, vol. 25, p. 1628, 1996. [25] D. Stein, D. Hetherington, M. Dugger and T. Stout, Journal of Electronic Materials, vol. 25, p. 1623, 1996. [26] D. P. Goetz, Structured abrasive CMP: length scales, subpads and planarization, Materials Research Society Symposium on CMP: Fundamentals and Challenges, San Francisco, CA, April 1999. 62