3 Modeling Process Quality 3.1 Introduction • Section 3.1 contains basic numerical and graphical methods. It is assumed the student is familiar with these methods. • Goal: Review several discrete and continuous distributions commonly used in modeling applications of physical phenomena. In general, these distributions will depend on either one or two parameters. Once parameter values are assigned, the distributions are completely defined. • In statistics, we study techniques for estimating these parameters. In probability, we study properties of these distributions and how they can be used to assign probabilities to various situations. 3.2 Important Discrete Distributions Bernoulli Distribution • Situation: – A single experiment has two possible events of interest, say 0 and 1 where 1 is generally referred to as a success and 0 as a failure. – The random variable X assumes the value X = 0 or X = 1 such that p = P (X = 1) and q = 1 − p = P (X = 0). • Properties: – The pdf f (x) of X is given by: f (x) = px (1 − p)1−x = px q 1−x where x = 0, 1 – The mean E(X) = p – The variance Var(X) = p(1 − p) = pq • X is known as a Bernoulli random variable and it follows a Bernoulli distribution. A single performance of this type of experiment is known as a Bernoulli trial. • p is the parameter of the Bernoulli distribution. Binomial Distribution • Situation: – The experiment consists of a sequence of n independent Bernoulli trials. The random variable of interest X is the number of successes observed in n Bernoulli trials.Thus, X can assume the values x = 0, 1, 2, . . . , n. • Properties: n x n x n−x n−x – The pdf b(x; n, p) of X is given by b(x; n, p) = p (1 − p) = p q x x where x = 0, 1, . . . , n. – X is known as a binomial random variable which follows a binomial distribution. 15 – The CDF of a binomial random variable is given by: x x X X n k n−k n k n−k B(x; n, p) = p (1 − p) = p q k k k=1 k=1 x = 0, 1, . . . , n – The mean E(X) = np and the variance Var(X) = npq = np(1 − p) – Notation: To state that X follows a binomial distribution, we write: X ∼ B(x; n, p) or X ∼ BIN(n, p) – p and n are the parameters of the binomial distribution. • Application: Quality Control Sampling – Items are produced in a production line process. The probability that any item has a defect is p. During production, n items are selected according to some sampling plan. – If X is the number of defective items in the sample, then X ∼ B(n, p) (assuming the items are produced independently). Hypergeometric Distribution • Situation: – There are a total of N items in the experiment: M of Type 1 and N − M of Type 2. n items are selected randomly from the N items without replacement. – Let X be the number of Type 1 items in the sample of n items. – Let xmin = max(0, n − N + M ) and xmax = min(n, M ). – The random variable X can integer values x where xmin ≤ x ≤ xmax . • Properties: – The pdf h(x; n, M, N ) of X is given by h(x; n, M, N ) = M x N −M n−x N n – X is known as a hypergeometric random variable which follows a hypergeometric distribution. The CDF of a hypergeometric random variable is given by: H(x; n, M, N ) = bxc X M k k=xmin N −M n−k N n where b.c is the floor function, and xmin ≤ x ≤ xmax . nM M N −n – The mean E(X) = nM/N and the variance Var(X) = 1− N N N −1 – Notation: To state that X follows a hypergeometric distribution, we write: X ∼ HYP(n, M, N ) – n, M , and N are the parameters of the hypergeometric distribution. 16 • Applications: Sampling from a Finite Population – A population contains N items. M possess a certain characteristic of interest (e.g., an item is defective) and the remaining N − M do not. In general, it is too costly to census the entire population so a random sample is taken. – Let n be the sample size. Let X be the number of items in the sample that possess the characteristic of interest. – X follows a hypergeometric distribution, i.e., X ∼ HYP(n, M, N ). Negative Binomial Distribution • Situation: – The experiment consists of a sequence of independent Bernoulli trials (with probability of success p) until the rth success appears. – Let the random variable X be the number of Bernoulli trials until the rth success. – Note X is countably infinite, that is, the values that X can assume are r, r + 1, r + 2, . . .. • Properties: – The pdf f (x; r, p) of X is given by: x−1 r x − 1 r x−r x−r f (x; r, p) = p (1 − p) = pq r−1 r−1 x = r, r + 1, r + 2, . . . – X is known as a negative binomial random variable which follows a negative binomial distribution. – The CDF of a geometric random variable is given by: F (x; r, p) = x X k−1 k=r r−1 r k−r p (1 − p) x X k − 1 r k−r = pq r−1 k=r k = r, r + 1, r + 2, . . . – The mean E(X) = r/p and the variance Var(X) = r(1 − p)/p2 = rq/p2 . – Notation: If X follows a negative binomial distribution, we write: X ∼ NB(r, p). – p and r are the parameters of the negative binomial distribution. • Application Quality Control Sampling – Items are produced in a production line process. The probability that any item has a defect is p. If r defects are found before R units are sampled, the process is shut down in search of a cause. – You are interested in the average run length (ARL) of the process between shutdowns for various values of p and r. You want to develop a sampling plan (choosing r and R) based on the cost of shutting down the process vs sending defective items to a customer. – If X is the number of items sampled until the rth defect is found, then X ∼ NB(r, p) and the ARL = E(X). 17 Geometric Distribution (Special Case of Negative Binomial with r = 1) • Situation: Let the random variable X be the number of Bernoulli trials until the first success. • Properties: – The pdf g(x; p) of X is given by: g(x; p) = p(1 − p)x−1 = pq x−1 x = 1, 2, 3, . . . . – X is known as a geometric random variable which follows a geometric distribution. – The CDF of a geometric random variable is given by: G(x; p) = x X p(1 − p)k−1 = 1 − (1 − p)x = 1 − q x x = 1, 2, 3, . . . k=1 – The mean E(X) = 1/p and the variance Var(X) = (1 − p)/p2 = q/p2 . – Notation: To state that X follows a geometric distribution, we write X ∼ GEO(p). – p is the parameter of the geometric distribution. • Application Run Length in Quality Control Sampling – If X is the number of items sampled until a defect is found, then X ∼ GEO(p) and the ARL = E(X). Poisson Distribution • Situation – Poisson Processes: – Consider a physical situation in which a certain type of event is recurring. Let X(t) denote the number of such occurrences that occur in a given interval [0, t]. – Assume the following assumptions hold: 1. The probability that an event will occur in a given short interval [t, t + ∆t] is approximately proportional to its length ∆t, and this probability does not depend on the position on the interval. 2. The occurrences of events in nonoverlapping intervals are independent. 3. The probability of two or more events in a short interval [t, t + ∆t] is negligible. – If these assumptions hold as ∆t → 0, then the distribution of X(t) is Poisson. – If these conditions hold, then for all t > 0, P [X(t) = n] = e−λt (λt)n /n!. – Note X is countably infinite, that is, the values that X can assume are 0, 1, 2, . . .. – λ is called the rate of occurrence or intensity of the Poisson process. Because λ is assumed constant, the Poisson process is homogeneous. • Properties: – Suppose t is fixed, and let µ = λt. Then the pdf f (x; µ) of X is given by: f (x; µ) = e−µ (µ)x /x! x = 0, 1, 2, . . . – X is known as a Poisson random variable which follows a Poisson distribution. 18 – The CDF of a Poisson random variable is given by: x X F (x; µ) = e−µ (µ)k /k! x = 0, 1, 2, . . . k=0 – The mean E(X) = µ and the variance Var(X) = µ – Notation: To state that X follows a Poisson distribution, we write X ∼ POI(µ). – µ is the parameter of the Poisson distribution. • Application 1 Queueing Theory • Application 2 Defect Occurrence in Quality Control 3.3 Important Continuous Distributions Gamma Distribution Z ∞ • The gamma function is Γ(κ) = tκ−1 e−t dt for all κ > 0 0 • Recall: (i) Γ(κ) = (κ − 1)Γ(κ − 1) √ 1 Γ = π 2 (ii) Γ(n) = (n − 1)! for n = 1, 2, 3, . . . • Situation: The random variable X can assume values > 0. • Properties: – For κ > 0 and θ > 0, the pdf f (x; θ, κ) of X is given by: 1 θκ Γ(κ) xκ−1 e−x/θ for x > 0 f (x; θ, κ) = 0 otherwise – The mean E(X) = κθ and the variance Var(X) = κθ2 • X is known as a gamma random variable and it follows a gamma distribution. • Notation: To state that X follows a gamma distribution, we write X ∼ GAM (θ, κ). • The CDF of a gamma random variable is given by: F (x; θ, κ) = for x ≤ 0 0 Rx 1 tκ−1 e−t/θ 0 θκ Γ(κ) dt for x > 0 • Case 1: For κ = n (n = 1, 2, . . .), the CDF of a gamma random variable reduces to: F (x; θ, n) = 0 1− Pn−1 i=0 (x/θ)i −x/θ e i! for x ≤ 0 for x > 0 19 (iii) • θ and κ are the parameters of the gamma distribution. κ is the shape parameter and θ is the scale parameter. • By varying κ and θ, the gamma pdf can assume many possible shapes. Therefore, by choosing appropriate κ and θ values, we can find a density that reasonably approximates many realworld situations. • Case 2: When θ = 2 and κ = ν/2, this gamma distribution is referred to as a chi-square distribution with ν degrees of freedom. • Case 3: When κ = 1, this gamma distribution is known as an exponential distribution. • Application 1: Distribution of Lifetimes • Application 2: Waiting Times (in queueing problems) Exponential Distribution • Properties: – The pdf f (x; θ) of X is given by: f (x; θ) = 1 −x/θ forx > 0 θe 0 otherwise – The mean E(X) = θ and the variance Var(X) = θ2 • X is an exponential random variable and it follows an exponential distribution. • Notation: To state that X follows an exponential distribution, we write X ∼ EXP (θ). • The CDF of an exponential random variable is given by: F (x; θ) = 0 for x ≤ 0 1 − e−x/θ for x > 0 • θ is the parameter of the exponential distribution and is a scale parameter. • Application 1: Distribution of Lifetimes of Electronic Components • Application 2: Interarrival Times (in queueing problems) Weibull Distribution • Situation: The random variable X assumes the values x > 0. • Properties: – For β > 0 and θ > 0, the pdf f (x; θ, β) of X is given by: β θββ xβ−1 e−(x/θ) for x > 0 f (x; θ, β) = 0 otherwise 20 – By varying β and θ, the Weibull pdf can assume many possible shapes, and, therefore, by choosing appropriate β and θ, we can find a density that reasonably approximates many real-world situations. 1 – The mean E(X) = θΓ 1 + β 1 2 2 2 −Γ 1+ – The variance Var(X) = θ Γ 1 + β β • X is known as a Weibull random variable and it follows a Weibull distribution. • Notation: To state that X follows a Weibull distribution, we write X ∼ W EI(θ, β). • The CDF of a Weibull random variable is given by: F (x; θ, β) = 0 1 − e(−t/θ) for x ≤ 0 β for x > 0 • By solving the CDF for x, we have the pth percentile of the Weibull distribution: xp = θ [−ln(1 − p)]1/β • θ and β are the parameters of the Weibull distribution. β is the shape parameter and θ is the scale parameter. • Application: Lifetime distributions of products. Normal Distribution • Situation: – The distribution of the random variable X is symmetric, and the parameters µ and σ can be varied so the distribution is centered at any value and whose variability can be made as small or large as needed. The normal pdf follows a bell-shape. – Although the normal distribution in positive over the real number line, it can provide a reasonable and very reliable model for variables that take on values in some interval. This is because negligible probability is associated with the tails of the normal distribution. • Properties: – For any real-valued µ and σ > 0, the pdf f (x; µ, σ) of X is given by: 1 2 f (x; µ, σ) = √ e−[(x−µ)/σ] /2 σ 2π – The mean E(X) = µ and the variance Var(X) = σ 2 • X is known as a normal random variable and it follows a normal distribution. • Notation: To state that X follows a normal distribution, we write: X ∼ N (µ, σ 2 ). 21 • Warning: Some texts use σ. That is, X ∼ N (µ, σ). • µ and σ are the parameters of the normal distribution. µ is the location parameter and σ is the scale parameter. Some disciplines, e.g. engineering, refer to the normal distribution as the Gaussian distribution. The Standard Normal Distribution • Special Case: When µ = 0 and σ 2 = 1, this normal distribution is referred to as the standard normal distribution. The pdf φ(z) and CDF Φ(z) of a standard normal random variable are denoted: 1 2 φ(z) = √ e−z /2 2π 1 Φ(z) = √ 2π Z z 2 /2 e−t dt −∞ X −µ • Theorem Let X ∼ N (µ, σ 2 ). Then if we let Z = , Z ∼ N (0, 1), i.e., Z is a standard σ normal random variable and x−µ F (x; µ, σ) = Φ σ Distribution Examples Example 1: An assembly line produces machine parts. Assume independence in the production of parts. That is, there is a constant probability of producing a defective part. 1. Suppose a random sample of n parts is to be taken. Let the random variable X = the number of defective parts in the random sample and p = the probability of producing a defective part. If X ≥ 2, then the process is shut down to look for an assignable cause for the defects. For p = .01, .05, and .10, what is the probability the process is shut down if n = 15? 2. Suppose parts from this assembly line production process are tested until a defective part is found. Let X = the number of parts sampled when the first defective part is found. What is the distribution of X for p = .01, .10? and Plot the E(X) vs p for 0 < p ≤ 1. 3. Suppose parts from this assembly line production process are tested until two defective parts are found. Let X = the number of parts sampled when the second defective part is found. What is the distribution of X for p = .01, .10? and Plot the E(X) vs p for 0 < p ≤ 1. Example 2: Suppose a metal substrate is partitioned into an 8 × 8 grid of 64 squares. The colony of bacteria is denoted by the 7 bulleted squares. Suppose a random sample of n squares is to be taken. We say a colony is “detected” if at least one bulleted square is in the sample. Let X = the number of bulleted squares contained in the sample. What is the distribution of X? and For n = 5, 10, what is the probability the colony is detected? • • 22 • • • • •