MIT OpenCourseWare http://ocw.mit.edu _________ ___ 2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: ________________ http://ocw.mit.edu/terms. Control of Manufacturing Processes Subject 2.830/6.780/ESD.63 Spring 2008 Lecture #5 Probability Models, Parameter Estimation, and Sampling February 21, 2008 Manufacturing 1 The Normal Distribution 1 p(x) = e σ 2π 1 ⎛ x− μ⎞ − ⎜ ⎟ 2⎝ σ ⎠ 2 0.4 “Standard normal” z= 0.3 x−μ σ 0.2 μz=0 σz=1 0.1 0 -4 -3 -2 -1 0 1 2 3 4 z Manufacturing 2 Properties of the Normal pdf • Symmetric about mean • Only two parameters: μ and σ 1 p(x) = e σ 2π 1 ⎛ x− μ⎞ − ⎜ ⎟ ⎝ 2 σ ⎠ 2 • Mean (μ) and Variance ( σ2 ) have well known “estimators” (average and sample variance) Manufacturing 3 Testing the Model: e.g. Is the Process “Normal” ? • Is the underlying distribution really normal? – Look at histogram – Look at curve fit to histogram – Look at % of data in 1, 2 and 3σ bands • Confidence Intervals – Look at “kurtosis” • Measure of deviation from normal – Probability (or qq) plots (see Mont. 3-3.7 or MATLAB stats toolbox) Manufacturing 4 Kurtosis: Deviation from Normal k = 0 - normal k > 0 - more “peaked” k < 0 - more “flat” For sampled data: 4 ⎡ n(n + 1) 3(n − 1)2 ⎛ xi − x ⎞ ⎤ k=⎢ ⎟ ⎥− ⎜ ∑ ⎢⎣ (n − 1)(n − 2)(n − 3) ⎝ s ⎠ ⎥⎦ (n − 2)(n − 3) Manufacturing 5 Kurtosis for Some Common Distributions D: Laplace (k = 3) L: logistic (k = 1.2) N: normal (k = 0) U: uniform (k = -1.2) Source: Wikimedia Commons, http://commons.wikimedia.org Manufacturing 6 Quantile-Quantile (qq) Plots • Plot – normalized (mean centered and scaled to s) vs. – theoretical position of unit normal distribution for ordered data • Normal distribution: data should fall along line Manufacturing Source: Wikimedia Commons, http://commons.wikimedia.org 7 Guaranteeing “Normality” The Central Limit Theorem – If x1, x2 ,x3 ...xN … are N independent observations of a random variable with “moments” μx and σ2x, – The distribution of the sum of all the samples will tend toward normal. Manufacturing 8 Example: Uniformly Distributed Data 70 60 x1 50 40 Sum of 100 sets of 1000 points each 30 100 y = ∑ xi 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 + i =1 150 70 60 50 x2 40 30 20 100 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 +... 50 70 60 50 x100 40 30 20 0 35 40 45 50 55 60 65 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Manufacturing 0.7 0.8 0.9 1 9 Sampling: Using Measurements (Data) to Model the Random Process • • In general p(x) is unknown Data can suggest form of p(x) – e.g.. uniform, normal, weibull, etc. • Data can be used to estimate parameters of distributions – e.g. μ and σ for normal distribution: p(x) = N(μ ,σ2) • How to estimate – Sample Statistics • Uncertainty in estimates 1 p(x) = e σ 2π 1 ⎛ x− μ⎞ − ⎜ ⎟ 2⎝ σ ⎠ 2 – Sample Statistic pdf’s Manufacturing 10 Sample Statistics Average or sample mean Sample variance Sample standard deviation Manufacturing 11 Sample Mean Uncertainty • If all xi come from a distribution with μx and σ2x, and we divide the sum by n: 1 n x = ∑ xi n i =1 Then: x = c1 x1 + c2 x2 + c3 x 3 + K cn xn 1 ci = n μx = μx Manufacturing and 1 2 1 σ = σ x or σ x = σx n n 2 x 12 Manufacturing as Random Processes • All physical processes have a degree of natural randomness • We can model this behavior using probability distribution functions • We can calibrate and evaluate the quality of this model from measurement data Manufacturing 13 Formal Use of Statistical Models • Discrete Variable Distributions and Uses – Attribute Modeling • Sampling: Key distributions arising in sampling • Chi-square, t, and F distributions • Estimation: – Reasoning about the population based on a sample • Some basic confidence intervals • Estimate of mean with variance known • Estimate of mean with variance not known • Estimate of variance • Hypothesis tests Manufacturing 14 Discrete Distribution: Bernoulli Bernoulli trial: an experiment with two outcomes Probability density function (pdf): f(x) ¾ ¼ p 1-p 0 Manufacturing 1 x 15 Discrete Distribution: Binomial Repeated random Bernoulli trials • n is the number of trials • p is the probability of “success” on any one trial • x is the number of successes in n trials Manufacturing 16 Binomial Distribution Binomial Distribution 1.2 1 0.8 0.25 Series1 0.6 0.4 0.2 29 27 25 23 21 19 17 15 9 13 7 11 5 3 1 0 0.2 e.g. expected number of “rejects” 0.15 0.1 0.05 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1 0 Number of "Successes" Manufacturing 17 Discrete Distribution: Poisson Mean: Variance: Example applications: # misprints on page(s) of a book # defects on a wafer • Poisson is a good approximation to Binomial when n is large and p is small (< 0.1) Manufacturing 18 Poisson Distributions Poisson Distribution 0.2 λ=5 0.18 0.16 0.14 0.12 0.1 c 0.08 Poisson Distribution 0.06 0.1 0.04 λ=20 0.09 0.02 0.08 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit 0.07 0.06 Poisson Distribution 0.05 c 0.04 0.08 0.03 λ=30 0.07 0.06 0.02 0.01 0.05 0 1 0.04 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit 0.03 0.02 e.g. defects/device 0.01 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit Manufacturing 19 Back to Continuous Distributions • Uniform Distribution • Normal Distribution – Unit (Standard) Normal Distribution Manufacturing 20 Continuous Distribution: Uniform 1 cdf a b x a b x pdf Manufacturing 21 Standard Questions For a Known cdf or pdf • Probability x less than or equal to some value • Probability x sits within some range Manufacturing 1 a b x a b x 22 Continuous Distribution: Normal or Gaussian 1 0.99865 cdf 0.977 0.84 0.5 0.00135 0.0227 0.16 0 pdf Manufacturing 23 Continuous Distribution: Unit Normal • Normalization Mean Variance pdf cdf Manufacturing 24 Using the Unit Normal pdf and cdf • We often want to talk about “percentage points” of the distribution – portion in the tails 1 0.9 0.5 0.1 Manufacturing 25 Use of the pdf: Location of Data • How likely are certain values of the random variable? • For a “Standard Normal” Distribution: z= (x − μ ) σ μ=0 N(0,1) σ=1 0.4 z = 1 ⇒ x = 1σ z = 2 ⇒ x = 2σ z = 3 ⇒ x = 3σ 0.3 0.2 0.1 0 -4 Manufacturing -3 -2 -1 0 1 2 3 4 26 Location of Data P(-1≤ z ≤ 1) = P(z ≤1) - P(z ≤-1) = Φ(1) - Φ(-1) (+ 1σ) = 0.841 - (1-0.841) = 0.682 P(-2≤ z ≤ 2) = P(z ≤2) - P(z ≤-2) = 0.977 - (1-0.977) = 0.954 (+ 2σ) P(-3≤ z ≤ 3) = P(z ≤3) - P(z ≤-3) = 0.998 - (1-0.998) = 0.997 (+ 3σ) Φ(z) tabulated (e.g. p. 752 of Montgomery) Manufacturing 27 Statistics The field of statistics is about reasoning in the face of uncertainty, based on evidence from observed data • Beliefs: – Probability distribution or probabilistic model form – Distribution/model parameters • Evidence: – Finite set of observations or data drawn from a population (experimental measurements or observations) • Models: – Seek to explain data wrt a model of their probability Manufacturing 29 Sampling to Determine Parameters of the Parent Probability Distribution • Assume Process Under Study has a Parent Distribution p(x) • Take “n” Samples From the Process Output (xi) • Look at Sample Statistics (e.g. sample mean and sample variance) • Relationship to Parent • Both are Random Variables • Both Have Their Own Probability Distributions • Inferences about the process (the parent distribution) via Inferences about the derived sampling distribution Manufacturing 30 Moments of the Population vs. Sample Statistics Underlying model or Population Probability Sample Statistics • Mean • Variance • Standard Deviation • Covariance • Correlation Coefficient Manufacturing 31 Sampling and Estimation • Sampling: act of making observations from populations • Random sampling: when each observation is identically and independently distributed (IID) • Statistic: a function of sample data; a value that can be computed from data (contains no unknowns) – Average, median, standard deviation – Statistics are by definition also random variables Manufacturing 32 Population vs. Sampling Distribution Population (“true”)probability density function) n = 20 Sample Mean (statistic) Sample Mean Distribution (sampling distribution) n = 10 n=2 n=1 Manufacturing 33 Sampling and Estimation, cont. • Sampling • Random sampling • Statistic • A statistic is a random variable, which itself has a sampling (probability) distribution – I.e., if we take multiple random samples, the value for the statistic will be different for each set of samples, but will be governed by the same sampling distribution • If we know the appropriate sampling distribution, we can reason about the underlying population based on the observed value of a statistic – E.g. we calculate a sample mean from a random sample; in what range do we think the actual (population) mean sits? Manufacturing 34 Sampling and Estimation – An Example • Suppose we know that the thickness of a part is normally distributed with std. dev. of 10: • We sample n = 50 random parts and compute the mean part thickness: • First question: What is distribution of the mean of T = • Second question: can we use knowledge of distribution to reason about the actual (population) mean μ given observed (sample) mean? Manufacturing 35 Estimation and Confidence Intervals • Point Estimation: – Find best values for parameters of a distribution – Should be • Unbiased: expected value of estimate should be true value • Minimum variance: should be estimator with smallest variance • Interval Estimation: – Give bounds that contain actual value with a given probability – Must know sampling distribution! Manufacturing 36 Confidence Intervals: Variance Known • We know σ, e.g. from historical data • Estimate mean in some interval to (1-α)100% confidence Remember the unit normal percentage points 1 Apply to the sampling distribution for the sample mean Manufacturing 0.9 0.5 0.1 37 Example, Cont’d • Second question: can we use knowledge of distribution to reason about the actual (population) mean μ given observed (sample) mean? n = 50 95% confidence interval, α = 0.05 ~95% of distribution lies within +/- 2σ of mean Manufacturing 38 Summary • Process as Random Variable – Histograms to pdf’s • Different Distributions for Different Processes – Discrete or Binary (e.g. Defects) – Continuous (e.g. Dimensional Variation) • Parent Distributions and Sampling – Estimating the Parent from Data • Use of Distributions to establish “Confidence” on Parameter Estimates Manufacturing 39