Supplemental Material for Chapter 2 S2-1. Independent Random Variables Preliminary Remarks Readers encounter random variables throughout the textbook. An informal definition of and notation for random variables is used. A random variable may be thought of informally as any variable for which the measured or observed value depends on a random or chance mechanism. That is, the value of a random variable cannot be known in advance of actual observation of the phenomena. Formally, of course, a random variable is a function that assigns a real number to each outcome in the sample space of the observed phenomena. Furthermore, it is customary to distinguish between the random variable and its observed value or realization by using an upper-case letter to denote the random variable (say X) and the actual numerical value x that is the result of an observation or a measured value. This formal notation is not used in the book because (1) it is not widely employed in the statistical quality control field and (2) it is usually quite clear from the context whether we are discussing the random variable or its realization. Independent Random variables In the textbook, we make frequent use of the concept of independent random variables. Most readers have been exposed to this in a basic statistics course, but here a brief review of the concept is given. For convenience, we consider only the case of continuous random variables. For the case of discrete random variables, refer to Montgomery and Runger (2003). Often there will be two or more random variables that jointly define some physical phenomena of interest. For example, suppose we consider injection-molded components used to assemble a connector for an automotive application. To adequately describe the connector, we might need to study both the hole interior diameter and the wall thickness of the component. Let x1 represent the hole interior diameter and x2 represent the wall thickness. The joint probability distribution (or density function) of these two continuous random variables can be specified by providing a method for calculating the probability that x1 and x2 assume a value in any region R of two-dimensional space, where the region R is often called the range space of the random variable. This is analogous to the probability density function for a single random variable. Let this joint probability density function be denoted by f ( x1 , x2 ) . Now the double integral of this joint probability density function over a specified region R provides the probability that x1 and x2 assume values in the range space R. A joint probability density function has the following properties: a. f ( x1 , x2 ) 0 for all x1 , x2 b. f ( x1 , x2 )dx1dx2 1 c. For any region R of two-dimensional space P{( x1 , x2 ) R} f ( x1 , x2 )dx1dx2 R 1 The two random variables x1 and x2 are independent if f ( x1, x2 ) f1 ( x1 ) f 2 ( x2 ) where f1 ( x1 ) and f2 ( x2 ) are the marginal probability distributions of x1 and x2, respectively, defined as f1 ( x1 ) f ( x1 , x2 )dx2 and f 2 ( x2 ) f ( x1 , x2 )dx1 In general, if there are p random variables x1 , x2 ,..., x p then the joint probability density function is f ( x1 , x2 ,..., x p ) , with the properties: a. f ( x1 , x2 ,..., x p ) 0 for all x1 , x2 ,..., x p b. ... f ( x , x ..., x 1 2 p )dx1dx2 ...dx p 1 R c. For any region R of p-dimensional space, P{( x1 , x2 ,..., x p ) R} ... f ( x1 , x2 ,..., x p )dx1dx2 ...dx p R The random variables x1, x2, …, xp are independent if f ( x1 , x2 ,..., x p ) f1 ( x1 ) f 2 ( x2 )... f p ( x p ) where fi ( xi ) are the marginal probability distributions of x1, x2 , …, xp, respectively, defined as fi ( xi ) ... f ( x1 , x2 ,..., x p )dx1dx2 ...dxi 1dxi 1...dx p Rxi S2-2. Development of the Poisson Distribution The Poisson distribution is widely used in statistical quality control and improvement, frequently as the underlying probability model for count data. As noted in Section 2-2.3 of the text, the Poisson distribution can be derived as a limiting form of the binomial distribution, and it can also be developed from a probability argument based on the birth and death process. We now give a summary of both developments. The Poisson Distribution as a Limiting Form of the Binomial Distribution Consider the binomial distribution n p( x) p x (1 p) n x x n! p x (1 p) n x , x 0,1, 2,..., n x !(n x)! Let np so that p / n . We may now write the binomial distribution as 2 n(n 1)(n 2) (n x 1) n p( x) x! n n x x 1 2 (1) 1 1 x! n n n x x x 1 1 1 1 n n n n Let n and p 0 so that np remains constant. The terms x 1 2 x 1 1 , 1 ,..., 1 and 1 all approach unity. Furthermore, n n n n 1 e as n n n Thus, upon substitution we see that the limiting form of the binomial distribution is p ( x) x e x! which is the Poisson distribution. Development of the Poisson Distribution from the Poisson Process Consider a collection of time-oriented events, arbitrarily called “arrivals” or “births”. Let xt be the number of these “arrivals” or “births” that occur in the interval [0,t). Note that the range space of xt is R = {0,1,…}. Assume that the number of births during nonoverlapping time intervals are independent random variables, and that there is a positive constant such that for any small time interval t , the following statements are true: 1. The probability that exactly one birth will occur in an interval of length t is t . 2. The probability that zero births will occur in the interval is 1 t . 3. The probability that more than one birth will occur in the interval is zero. The parameter is often called the mean arrival rate or the mean birth rate. This type of process, in which the probability of observing exactly one event in a small interval of time is constant (or the probability of occurrence of event is directly proportional to the length of the time interval), and the occurrence of events in non-overlapping time intervals is independent is called a Poisson process. In the following, let P{xt x} p( x) px (t ), x 0,1, 2,... Suppose that there have been no births up to time t. The probability that there are no births at the end of time t + t is p0 (t t ) (1 t ) p0 (t ) Note that 3 p0 (t t ) p0 (t ) p0 (t ) t so consequently p (t t ) p0 (t ) lim 0 p0 (t ) t p0 (t ) t 0 For x > 0 births at the end of time t + t we have px (t t ) px 1 (t ) t (1 t ) px (t ) and p (t t ) px (t ) lim x px (t ) t 0 t px 1 (t ) px (t ) Thus we have a system of differential equations that describe the arrivals or births: p0 (t ) p0 (t ) for x 0 px (t ) px 1 (t ) px (t ) for x 1, 2,... The solution to this set of equations is px (t ) ( t ) x e t x 0,1, 2,... x! Obviously for a fixed value of t this is the Poisson distribution. S2-3. The Mean and Variance of the Normal Distribution In Section 2-3.1 we introduce the normal distribution, with probability density function 1 2 ( x )2 1 f ( x) e 2 , x 2 and we stated that and 2 are the mean and variance, respectively, of the distribution. We now show that this claim is correct. Note that f ( x) 0 . We first evaluate the integral I f ( x)dx , showing that it is equal to 1. In the integral, change the variable of integration to z ( x ) / . Then I 1 z2 / 2 e dz 2 4 Since I 0, if I 2 1, then I 1 . Now we may write 1 x2 / 2 y 2 / 2 e dx e dy 2 1 ( x2 y 2 ) / 2 e dxdy 2 I2 If we switch to polar coordinates, then x r cos( ), y r sin( ) and 1 2 r 2 / 2 I2 e rdrd 2 0 0 1 2 1 d 2 1 2 0 2 So we have shown that f ( x) has the properties of a probability density function. The integrand obtained by the substitution z ( x ) / is, of course, the standard normal distribution, an important special case of the more general normal distribution. The standard normal probability density function has a special notation, namely ( z) 1 z2 / 2 e , z 2 and the cumulative standard normal distribution is z ( z ) (t )dt Several useful properties of the standard normal distribution can be found by basic calculus: 1. ( z ) ( z ), for all real z, so ( z ) is an even function (symmetric about 0) of z 2. ( z ) z ( z ) 3. ( z ) ( z 2 1) ( z ) Consequently, ( z ) has a unique maximum at z = 0, inflection points at z 1 , and both ( z ) 0 and ( z ) 0 as z . The mean and variance of the standard normal distribution are found as follows: E ( z ) z ( z )dz ( z )dz ( z ) | 0 and 5 E ( z 2 ) z 2 ( z )dz [ ( z ) ( z )]dz ( z ) | ( z )dz 0 1 1 Because the variance of a random variable can be expressed in terms of expectation as 2 E ( z )2 E ( z 2 ) 2 , we have shown that the mean and variance of the standard normal distribution are 0 and 1, respectively. Now consider the case where x follows the more general normal distribution. Based on the substitution, we have z ( x ) / 1 2 ( x )2 1 E ( x) x e 2 dx 2 ( z ) ( z )dz ( z )dz z ( z )dz (1) (0) and 1 2 ( x )2 1 E(x ) x e 2 dx 2 2 2 ( z ) 2 ( z )dz 2 ( z )dz 2 z ( z )dz 2 ( z )dz 2 2 Therefore, it follows that V ( x) E ( x 2 ) 2 ( 2 2 ) 2 2 . S2-4. More about the Lognormal Distribution The lognormal distribution is a general distribution of wide applicability. The lognormal distribution is defined only for positive values of the random variable x and the probability density function is f ( x) 1 x 2 e 1 2 2 (ln x )2 x0 6 The parameters of the lognormal distribution are and 0 2 . The lognormal random variable is related to the normal random variable in that y ln x is normally distributed with mean and variance 2 . The mean and variance of the lognormal distribution are E ( x) x e 12 2 V ( x) x2 e2 (e 1) 2 2 The median and mode of the lognormal distribution are x e mo e 2 In general, the kth origin moment of the lognormal random variable is E( xk ) e k 12 k 2 2 Like the gamma and Weibull distributions, the lognormal finds application in reliability engineering, often as a model for survival time of components or systems. Some important properties of the lognormal distribution are: 1. If x1 and x2 are independent lognormal random variables with parameters ( 1 , 12 ), ( 2 , 22 ) , respectively, then y x1 x2 is a lognormal random variable with parameters 1 2 and 12 22 . 2. If x1 , x2 ,..., xk are independently and identically distributed lognormal random variables with parameters and 2 , then the geometric mean of the xi, or 1/ k k xi i 1 , has a lognormal distribution with parameters and 2 / k . 3. If x is a lognormal random variable with parameters and 2 , and if a, b, and c are constants such that b ec , then the random variable y bx a has a lognormal distribution with parameters c a and a 2 2 . S2-5. More about the Gamma Distribution The gamma distribution is introduced in Section 3-3.3. The gamma probability density function is f ( x) ( r ) ( x) r 1 e x , x 0 where r > 0 is a shape parameter and 0 is a scale parameter. The parameter r is called a shape parameter because it determines the basic shape of the graph of the density function. For example, if r = 1, the gamma distribution reduces to an exponential 7 distribution. There are actually three basic shapes; r 1 or hyperexponential, r = 1 or exponential, and r > 1 or unimodal with right skew. The cumulative distribution function of the gamma is x 0 ( r ) F ( x; r , ) (t ) r 1 e x dt The substitution u t / in this integral results in F ( x; r , ) F ( x / ; r ,1) , which depends on only through the variable x / . We typically call such a parameter a scale parameter. It can be important to have a scale parameter in a probability distribution so that the results do not depend on the scale of measurement actually used. For example, suppose that we are measuring time in months, and 6 . The probability that x is less than or equal to 12 months is F (12 / 6; r ,1) F (2; r ,1) . If we wish to consider measuring time in weeks, then the probability that x is less than or equal to 48 weeks is just F (48 / 24; r ,1) F (2; r ,1) . Therefore, different scales of measurement can be accommodated by changing the scale parameter without having to change to a more general form of the distribution. When r is an integer, the gamma distribution is sometimes called the Erlang distribution. Another special case of the gamma distribution arises when we let r = ½, 1, 3/2, 2, … and 1/ 2 ; this is the chi-square distribution with degrees of freedom r / 1, 2,... . The chi-square distribution is very important in statistical inference. S2-6. The Failure Rate for the Exponential Distribution The exponential distribution f ( x) e x , x 0 was introduced in Section 2-3.2 of the text. The exponential distribution is frequently used in reliability engineering as a model for the lifetime or time to failure of a component or system. Generally, we define the reliability function of the unit as R (t ) P{x t} t 1 f ( x)dx 0 1 F (t ) where, of course, F (t ) is the cumulative distribution function. In biomedical applications, the reliability function is usually called the survival function. For the exponential distribution, the reliability function is F (t ) e t The Hazard Function The mean and variance of a distribution are quite important in reliability applications, but an additional property called the hazard function or the instantaneous failure rate is also useful. The hazard function is the conditional density function of failure at time t, given 8 that the unit has survived until time t. Therefore, letting X denote the random variable and x denote the realization, f ( x | X x ) h( x ) F ( x | X x) F ( x x | X x) F ( x | X x) x F ( x X x x | X x) lim x x F ( x X x x, X x) lim x xP{ X x} F ( x X x x) lim x x[1 F ( x)] f ( x) 1 F ( x) lim x It turns out that specifying a hazard function completely determines the cumulative distribution function (and vive-versa). The Hazard Function for the Exponential Distribution For the exponential distribution, the hazard function is h( x ) f ( x) 1 F ( x) e x e x That is, the hazard function for the exponential distribution is constant, or the failure rate is just the reciprocal of the mean time to failure. A constant failure rate implies that the reliability of the unit at time t does not depend on its age. This may be a reasonable assumption for some types of units, such as electrical components, but it’s probably unreasonable for mechanical components. It is probably not a good assumption for many types of system-level products that are made up of many components (such as an automobile). Generally, an increasing hazard function indicates that the unit is more likely to fail in the next increment of time than it would have been in an earlier increment of time of the same length. This is likely due to aging or wear. Despite the apparent simplicity of its hazard function, the exponential distribution has been an important distribution in reliability engineering. This is partly because the constant failure rate assumption is probably not unreasonable over some region of the unit’s life. S2-7. The Failure Rate for the Weibull Distribution 9 The instantaneous failure rate or the hazard function was defined in Section 8 of the Supplemental Text Material. For the Weibull distribution, the hazard function is h( x ) f ( x) 1 F ( x) ( / )( x / ) 1 e ( x / ) e ( x / ) x 1 Note that if 1 the Weibull hazard function is constant. This should be no surprise, since for 1 the Weibull distribution reduces to the exponential. When 1 , the Weibull hazard function increases, approaching as . Consequently, the Weibull is a fairly common choice as a model for components or systems that experience deterioration due to wear-out or fatigue. For the case where 1 , the Weibull hazard function decreases, approaching 0 as 0 . For comparison purposes, note that the hazard function for the gamma distribution with parameters r and is also constant for the case r = 1 (the gamma also reduces to the exponential when r = 1). Also, when r > 1 the hazard function increases, and when r < 1 the hazard function decreases. However, when r > 1 the hazard function approaches from below, while if r < 1 the hazard function approaches from above. Therefore, even though the graph of the gamma and Weibull distributions look very similar, and they can both produce reasonable fits to the same sample of data, they clearly have very different characteristics in terms of describing survival or reliability data. 10