IC Manufacturing and Yield ECE/ChE 4752: Microelectronics Processing Laboratory Gary S. May April 15, 2004 Outline Introduction Statistical Process Control Statistical Experimental Design Yield Motivation IC manufacturing processes must be stable, repeatable, and of high quality to yield products with acceptable performance. All persons involved in manufacturing an IC (including operators, engineers, and management) must continuously seek to improve manufacturing process output and reduce variability. Variability reduction is accomplished by strict process control. Production Efficiency Determined by actions both on and off the manufacturing floor Design for manufacturability (DFM): intended to improve production efficiency Process Design High Volume Manufacturing Circuit Design OFF ON Variability The most significant challenge in IC production Types of variability: human error equipment failure material non-uniformity substrate inhomogeneity lithography spots Deformations Variability leads to => deformations Types of deformations 1) Geometric: lateral (across wafer) vertical (into substrate) spot defects crystal defects (vacancies, interstitials) 2) Electrical: local (per die) global (per wafer) Outline Introduction Statistical Process Control Statistical Experimental Design Yield Statistical Process Control SPC = a powerful collection of problem solving tools to achieve process stability and reduce variability Primary tool = the control chart; developed by Dr. Walter Shewhart of Bell Laboratories in the 1920s. Control Charts Quality characteristic measured from a sample versus sample number or time Control limits typically set at 3s from center line (s = standard deviation) Control Chart for Attributes Some quality characteristics cannot be easily represented numerically (e.g., whether or not a wire bond is defective). In this case, the characteristic is classified as either "conforming" or "non- conforming", and there is no numerical value associated with the quality of the bond. Quality characteristics of this type are referred to as attributes. Defect Chart Also called “c-chart” Control chart for total number of defects Assumes that the presence of defects in samples of constant size is modeled by Poisson distribution, in which the probability of a defect occurring is c x e c P( x) x! where x is the number of defects and c > 0 Control Limits for C-Chart C-chart with ± 3s control limits is given by UCL c 3 c Centerline = c LCL c 3 c (assuming c is known) Control Limits for C-Chart If c is unknown, it can be estimated from the average number of defects in a sample. In this case, the control chart becomes UCL c 3 c Centerline = c LCL c 3 c Example Suppose the inspection of 25 silicon wafers yields 37 defects. Set up a c-chart. Solution: Estimate c using 37 c 1.48 25 This is the center line. The UCL and LCL can be found as follows UCL c 3 c 5.13 LCL c 3 c 2.17 Since –2.17 < 0, we set the LCL = 0. Defect Density Chart Also called a “u-chart” Control chart for the average number of defects over a sample size of n products. If there are c total defects among the n samples, the average number of defects per sample is c u n Control Limits for U-Chart U-chart with ± 3s control limits is given by: UCL u 3 u / n Center line = u LCL u 3 u / n where u is the average number of defects over m groups of size n Example Suppose an IC manufacturer wants to establish a defect density chart. Twenty different samples of size n = 5 wafers are inspected, and a total of 183 defects are found. Set up the u-chart . Solution: Estimate u using u c 183 u 1.83 m mn (20)(5) This is the center line. The UCL and LCL can be found as follows UCL u 3 u / n 3.64 LCL u 3 u / n 0.02 Control Charts for Variables In many cases, quality characteristics are expressed as specific numerical measurements. Example: the thickness of a film. In these cases, control charts for variables can provide more information regarding manufacturing process performance. Control of Mean and Variance Control of the mean is achieved using an chart: x x1 x2 xn 1 n x xi n n i 1 Variance can be monitored using the s-chart, where: 1 n 2 s ( x x ) i n 1 i 1 - Control Limits for Mean UCL x 3 s 2 / n Center x LCL x 3 s / n 2 where the grand average is: x1 x2 xm x m Control Limits for Variance s UCL s 3 1 c42 c4 Center s s LCL s 3 1 c42 c4 where: 1 m s si m i 1 and c4 is a constant Modified Control Limits for Mean The limits for the x -chart can also be written as: UCL x LCL x 3s c4 n 3s c4 n Example Suppose x and s-charts are to be established to control linewidth in a lithography process, and 25 samples of size n = 5 are measured. The grand average for the 125 lines is 4.01 mm. If s = 0.09 mm, what are the control limits for the charts? Solution: 3s 4.14mm For the x -chart: UCL x c4 n LCL x 3s c4 n 3.88mm Example Solution (cont.): For the s-chart: s UCL s 3 1 c42 0.19 mm c4 s LCL s 3 1 c42 0 c4 Outline Introduction Statistical Process Control Statistical Experimental Design Yield Background Experiments allow us to determine the effects of several variables on a given process. A designed experiment is a test or series of tests which involve purposeful changes to variables to observe the effect of the changes on the process. Statistical experimental design is an efficient approach for systematically varying these process variables and determining their impact on process quality. Application of this technique can lead to improved yield, reduced variability, reduced development time, and reduced cost. Comparing Distributions Consider the following yield data (in %): Is Method B better than Method A? Wafer Method A Method B 1 89.7 84.7 2 81.4 86.1 3 84.5 83.2 4 84.8 91.9 5 87.3 86.3 6 79.7 79.3 7 85.1 86.2 8 81.7 89.1 9 83.7 83.7 10 84.5 88.5 Avg 84.24 85.54 Hypothesis Testing We test the hypothesis that B is better than A using the null hypothesis: H0: mA = mB Test statistic: (y y ) t0 A sp where: B 1 1 n A nB are sample means of the yields, ni are number of trials for each sample, and yi 2 2 ( n 1 ) s ( n 1 ) s A B B s 2p A n A nB 2 Results Calculations: sA = 2.90 and sB = 3.65, sp = 3.30, and t0 = 0.88. Use Appendix K to determine the probability of computing a given t-statistic with a certain number of degrees of freedom. We find that the likelihood of computing a t-statistic with nA + nB - 2 = 18 degrees of freedom = 0.88 is 0.195. This means that there is only an 19.5% chance that the observed difference between the mean yields is due to pure chance. We can be 80.5% confident that Method B is really superior to Method A. Analysis of Variance The previous example shows how to use hypothesis testing to compare 2 distributions. It’s often important in IC manufacturing to compare several distributions. We might also be interested in determining which process conditions in particular have a significant impact on process quality. Analysis of variance (ANOVA) is a powerful technique for accomplishing these objectives. ANOVA Example Defect densities (cm-2) for 4 process recipes: k = 4 treatments n1 = 4, n2 = n3 = 6, n4 = 8; N = 24 Treatment means: y1 61 y2 66 y3 68 y4 61 Grand average: y 64 1 62 2 63 3 68 4 56 60 63 59 67 71 64 65 66 71 67 68 62 60 61 63 66 68 64 63 59 Sums of Squares Within treatments: nt k S R ( yti yt ) 2 t 1 i 1 Between treatments: k ST nt ( yt y ) 2 t 1 Total: k nt S D ( yti y ) 2 t 1 i 1 Degrees of Freedom Within treatments: R N k Between treatments: Total: D N 1 T k 1 Mean Squares Within treatments: sR2 S R / R Between treatments: Total: sD2 S D / D sT2 ST / T ANOVA Table for Defect Density Source Sum of Squares Degrees of Freedom Mean Square F-ratio Between Treatments ST = 228 T = 3 sT2 = 76.0 sT2/sR2= 13.6 Within Treatments SR = 112 R = 20 sR2 = 5.6 Total SD = 340 D = 23 sD2 = 14.8 Conclusions If null hypothesis were true, sT2/sR2 would follow the F distribution with T and R degrees of freedom. From Appendix L, the significance level for the Fratio of 13.6 with 3 and 30 degrees of freedom is 0.000046. This means that there is only a 0.0046% chance that the means are equal. In other words, we can be 99.9954% sure that real differences exist among the four different processes in our example. Factorial Designs Experimental design: organized method of conducting experiments to extract maximum information from limited experiments Goal: systematically explore effects of input variables, or factors (such as processing temperature), on responses (such as yield) All factors varied simultaneously, as opposed to "onevariable-at-a-time“ Factorial designs: consist of a fixed number of levels for each of a number of factors and experiments at all possible combinations of the levels. 2-Level Factorials Ranges of factors discretized into minimum, maximum and "center" levels. In 2-level factorial, minimum and maximum levels are used together in every possible combination. A full 2-level factorial with n factors requires 2n runs. Combinations of a 3-factor experiment can be represented as the vertices of a cube. (-1,1,1) (1,1,1) (-1,-1,1) (-1,1,-1) (1,1,-1) (-1,-1,-1) (1,-1,-1) 3 2 Factorial CVD Experiment Run 1 Factors: temperature (T), 2 pressure (P), flow 3 rate (F) 4 5 Response: deposition rate (D) 6 7 8 P + + + + T + + + + F + + + + D (Å/min) d1=94.8 d2=110.96 d3=214.12 d4=255.82 d5=94.14 d6=145.92 d7=286.71 d8=340.52 Main Effects Effect of any single variable on the response Computation method: find difference between average deposition rate when pressure is high and average rate when pressure is low: P = dp+ - dp- = 1/4[(d2 + d4 + d6 + d8) - (d1 + d3 + d5 + d7)] = 40.86 where P = pressure effect, dp+ = average dep rate when pressure is high, dp- = average rate when pressure is low Interpretation: average effect of increasing pressure from lowest to highest level increases dep rate by 40.86 Å/min. Other main effects for temperature and flow rate computed in a similar manner In general: main effect = y+ - y- Interaction Effects Example: pressure by temperature interaction (P × T). This is ½ difference in the average temperature effects at the two levels of pressure: P × T = dPT+ - dPT- = 1/4[(d1 + d4 + d5 + d8) - (d2 + d3 + d6 + d7)] = 6.89 P × F and T × F interactions are obtained similarly. Interaction of all three factors (P × T × F): average difference between any two-factor interaction at the high and low levels of the third factor: P × T × F = dPTF+ - dPTF- = -5.88 Yates Algorithm Can be tedious to calculate effects and interactions for factorial experiments using the previous method described above, Yates Algorithm provides a quicker method of computation that is relatively easy to program Although the Yates algorithm is relatively straightforward, modern analysis of statistical experiments is done by commercially available statistical software packages. A few of the more common packages: RS/1, SAS, and Minitab Yates Procedure Design matrix arranged in standard order (1st column has alternating and + signs, 2nd column has successive pairs of - and + signs, 3rd column has four - signs followed by four + signs, etc.) Column y contains the response for each run. 1st four entries in column (1) obtained by adding pairs together, and next four obtained by subtracting top number from the bottom number of each pair. Column (2) obtained from column (1) in the same way Column (3) obtained from column (2) To get the Effects, divide the column (3) entries by the Divisor 1st element in Identification (ID) column is grand average of all observations, and remaining identifications are derived by locating the plus signs in the design matrix. Yates Algorithm Illustration P T F y - - - 94.8 + - - - + + (1) (2) (3) Div Eff ID 205.76 675.70 1543.0 8 192.87 Avg 110.96 469.94 867.29 163.45 4 40.86 P - 214.12 240.06 57.86 651.35 4 162.84 T + - 255.82 627.23 105.59 27.57 4 6.89 PT - - + 94.14 16.16 264.18 191.59 4 47.90 F + - + 145.92 41.70 387.17 47.73 4 11.93 PF - + + 286.71 51.78 25.54 122.99 4 30.75 TF + + + 340.52 53.81 2.03 -23.51 4 -5.88 PTF Fractional Factorial Designs A disadvantage of 2-level factorials is that the number of experimental runs increasing exponentially with the number of factors. Fractional factorial designs are constructed to eliminate some of the runs needed in a full factorial design. For example, a half fractional design with n factors requires only 2n-1 runs. The trade-off is that some higher order effects or interactions may not be estimable. Fractional Factorial Example 23-1 fractional factorial design for CVD experiment: New design generated by writing full 22 design for P and T, then multiplying those columns to obtain F. Drawback: since we used PT to define F, can’t distinguish between the P × T interaction and the F main effect. The two effects are confounded. Run P T F 1 - - + 2 + - - 3 - + - 4 + + + Outline Introduction Statistical Process Control Statistical Experimental Design Yield Definitions Yield: percentage of devices or circuits that meet a nominal performance specification. Yield can be categorized as functional or parametric. Functional yield - also referred to as "hard yield”; characterized by open or short circuits caused by defects (such as particles). Parametric yield – proportion of functional product that fails to meet performance specifications for one or more parameters (such as speed, noise level, or power consumption); also called "soft yield" Functional Yield Y = f(Ac, D0) Ac = critical area (area where a defect has high probability of causing a fault) D0 = defect density (# defects/unit area) Poisson Model Let: C = # of chips on a wafer, M = # of defect types CM = number of unique ways in which M defects can be distributed on C chips Example: If there are 3 chips and 3 defect types (such as metal open, metal short, and metal 1 to metal 2 short, for example), then there are: CM = 33 = 27 possible ways in which these 3 defects can be distributed over 3 chips Unique Fault Combinations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 C1 M1M2M3 M1M2 M1M3 M2M3 M1M2 M1M3 M2M3 M1 M2 M3 M1 M2 C2 C3 15 M1M2M3 16 M1M2M3 17 M3 18 M2 19 M1 20 M3 21 M2 22 M1 23 M2M3 24 M1M3 25 M2M1 26 M2M3 27 M1M3 C1 M3 M1 M1 M2 M2 M3 M3 C2 M1M2 M1M3 M2M3 M1 M2 M3 M2 M3 M1 M3 M1 M2 C3 M2M1 M3 M2 M1 M2M3 M1M3 M2M1 M3 M2 M3 M1 M2 M1 Poisson Derivation If one chip contains no defects, the number of ways to distribute M defects among the remaining chips is: (C - 1)M Thus, the probability that a chip will have no defects of any type is: C – 1 M 1 M --------------------- = 1 – ---- M C C Substituting M = CAcD0, yield is # of chips with zero defects, or: 1 Y lim 1 C C CAc D0 exp( Ac D0 ) For N chips to have zero defects this becomes: Y exp( Ac D0 ) N exp( NAc D0 ) Murphy’s Yield Integral Murphy proposed that defect density should not be constant. D should be summed over all circuits and substrates using a normalized probability density function f(D). The yield can then be calculated using the integral Y e Ac D0 f ( D)dD 0 Various forms of f(D) exist and form the basis for many analytical yield models. Probability Density Functions Poisson Model Poisson model assumes f(D) is a delta function: f(D) = d(D - D0) where D0 is the average defect density Using this density function, the yield is Y e Ac D0 f ( D)dD exp( Ac D0 ) 0 Uniform Density Function Murphy initially investigated a uniform density function. Evaluation of the yield integral for the uniform density function gives: 2 D0 Ac 1 e Yuniform 2 D0 Ac Triangular Density Function Murphy later believed that a Gaussian distribution would be a better reflection of the true defect density function. He approximated a Gaussian function with the triangular function, resulting in the yield expression: 1 e Ytriangular 2 D0 Ac D0 Ac 2 The triangular model is widely used today in industry to determine the effect of manufacturing process defect density. Seeds Model Seeds theorized high yields were caused by a large population of low defect densities and a small proportion of high defect densities D 1 f ( D ) exp He proposed an exponential function: D D 0 0 This implies that the probability of observing a low defect density is higher than observing a high defect density. Substituting this function in the Murphy integral yields: Yexponential 1 1 D0 Ac Although the Seeds model is simple, its yield predictions for large area substrates are too optimistic. Negative Binomial Model f(D) a=3 a=2 a=1 D/D 0 Uses Gamma distribution Density function: f(D) = [G(a)ba]1Da1eD/b Average defect density is D0 = ab Negative Binomial (cont.) a Yield: a = “cluster” parameter (must be empirically determined a high: variability of defects is low (little clustering); gamma function approaches a delta function; negative binomial model reduces to Poisson model a low: variability of defects is significant (much clustering); gamma model reduces to Seeds exponential model If the Ac and D0 are known (or can be measured), negative binomial model is an excellent general purpose yield predictor. Ac D0 Ygamma 1 a Parametric Yield Evaluated using “Monte Carlo” simulation Let all parameters vary at random according to a known distribution (usually normal) Measure the distribution in performance Recall: W 2 I Dnsat m nCox VGS VTn L Or: IDnsat = f (tox, VTn) Input Distributions Assume: mean (m) and standard deviation (s) are known for tox, VTn tox VTn Calculate IDnsat for each combination of (tox, VTn) Output Distribution f(x) IDnsat (bad devices) c b (moderate devices) Yield (best parts) = f ( x)dx a c Yield (worst parts) = f ( x)dx a x (good devices)