4 Continuous Random Variables and Probability Distributions Copyright © Cengage Learning. All rights reserved. 1 Probability Density Functions Recall from Chapter 3 that a random variable X is continuous if (1) possible values comprise either a single interval on the number line (for some A < B, any number x between A and B is a possible value) or a union of disjoint intervals, and (2) P(X = c) = 0 for any number c that is a possible value of X. 2 Probability Distributions for Continuous Variables Suppose the variable X of interest is the depth of a lake at a randomly chosen point on the surface. Let M = the maximum depth (in meters), so that any number in the interval [0, M] is a possible value of X. If we “discretize” X by measuring depth to the nearest meter, then possible values are nonnegative integers less than or equal to M. The resulting discrete distribution of depth can be pictured using a probability histogram. 3 Probability Distributions for Continuous Variables If we draw the histogram so that the area of the rectangle above any possible integer k is the proportion of the lake whose depth is (to the nearest meter) k, then the total area of all rectangles is 1. A possible histogram appears in Figure 4.1(a). Probability histogram of depth measured to the nearest meter Figure 4.1(a) 4 Probability Distributions for Continuous Variables If depth is measured much more accurately and the same measurement axis as in Figure 4.1(a) is used, each rectangle in the resulting probability histogram is much narrower, though the total area of all rectangles is still 1. A possible histogram is pictured in Figure 4.1(b). Probability histogram of depth measured to the nearest centimeter Figure 4.1(b) 5 Probability Distributions for Continuous Variables It has a much smoother appearance than the histogram in Figure 4.1(a). If we continue in this way to measure depth more and more finely, the resulting sequence of histograms approaches a smooth curve, such as is pictured in Figure 4.1(c). A limit of a sequence of discrete histograms Figure 4.1(c) 6 Probability Distributions for Continuous Variables Because for each histogram the total area of all rectangles equals 1, the total area under the smooth curve is also 1. The probability that the depth at a randomly chosen point is between a and b is just the area under the smooth curve between a and b. It is exactly a smooth curve of the type pictured in Figure 4.1(c) that specifies a continuous probability distribution. 7 Probability Distributions for Continuous Variables Definition Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a b, P(a X b) = 8 Probability Distributions for Continuous Variables That is, the probability that X takes on a value in the interval [a, b] is the area above this interval and under the graph of the density function, as illustrated in Figure 4.2. P(a X b) = the area under the density curve between a and b Figure 4.2 The graph of f(x) is often referred to as the density curve. 9 Probability Distributions for Continuous Variables For f(x) to be a legitimate pdf, it must satisfy the following two conditions: 1. f(x) 0 for all x 2. = area under the entire graph of f(x) =1 10 Probability Distributions for Continuous Variables Definition A continuous rv X is said to have a uniform distribution on the interval [A, B] if the pdf of X is 11 Probability Distributions for Continuous Variables The fact that P(X = c) = 0 when X is continuous has an important practical consequence: The probability that X lies in some interval between a and b does not depend on whether the lower limit a or the upper limit b is included in the probability calculation: P(a X b) = P(a < X < b) = P(a < X b) = P(a X < b) (4.1) If X is discrete and both a and b are possible values (e.g., X is binomial with n = 20 and a = 5, b = 10), then all four of the probabilities in (4.1) are different. 12 Probability Distributions for Continuous Variables Unlike discrete distributions such as the binomial, hypergeometric, and negative binomial, the distribution of any given continuous rv cannot usually be derived using simple probabilistic arguments. Just as in the discrete case, it is often helpful to think of the population of interest as consisting of X values rather than individuals or objects. The pdf is then a model for the distribution of values in this numerical population, and from this model various population characteristics (such as the mean) can be calculated. 13 The Cumulative Distribution Function The cumulative distribution function (cdf) F(x) for a discrete rv X gives, for any specified number x, the probability P(X x) . It is obtained by summing the pmf p(y) over all possible values y satisfying y x. The cdf of a continuous rv gives the same probabilities P(X x) and is obtained by integrating the pdf f(y) between the limits and x. 14 The Cumulative Distribution Function Definition The cumulative distribution function F(x) for a continuous rv X is defined for every number x by F(x) = P(X x) = For each x, F(x) is the area under the density curve to the left of x. This is illustrated in Figure 4.5, where F(x) increases smoothly as x increases. A pdf and associated cdf Figure 4.5 15 Let X have a uniform distribution on [A, B]. The density function is shown in Figure 4.6. The pdf for a uniform distribution Figure 4.6 16 cont’d For x < A, F(x) = 0, since there is no area under the graph of the density function to the left of such an x. For x B, F(x) = 1, since all the area is accumulated to the left of such an x. Finally for A x B, 17 cont’d The entire cdf is The graph of this cdf appears in Figure 4.7. The cdf for a uniform distribution Figure 4.7 18 Using F(x) to Compute Probabilities The importance of the cdf here, just as for discrete rv’s, is that probabilities of various intervals can be computed from a formula for or table of F(x). Proposition Let X be a continuous rv with pdf f(x) and cdf F(x). Then for any number a, P(X > a) = 1 – F(a) and for any two numbers a and b with a < b, P(a X b) = F(b) – F(a) 19 Obtaining f(x) from F(x) For X discrete, the pmf is obtained from the cdf by taking the difference between two F(x) values. The continuous analog of a difference is a derivative. The following result is a consequence of the Fundamental Theorem of Calculus. Proposition If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F(x) exists, F(x) = f(x). 20 When X has a uniform distribution, F(x) is differentiable except at x = A and x = B, where the graph of F(x) has sharp corners. Since F(x) = 0 for x < A and F(x) = 1 for x > B, F(x) = 0 = f(x) for such x. For A < x < B, 21 Percentiles of a Continuous Distribution When we say that an individual’s test score was at the 85th percentile of the population, we mean that 85% of all population scores were below that score and 15% were above. Similarly, the 40th percentile is the score that exceeds 40% of all scores and is exceeded by 60% of all scores. 22 Percentiles of a Continuous Distribution Proposition Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X, denoted by (p), is defined by p = F((p)) = F(y) dy (4.2) According to Expression (4.2), (p) is that value on the measurement axis such that 100p% of the area under the graph of f(x) lies to the left of (p) and 100(1 – p)% lies to the right. 23 Percentiles of a Continuous Distribution Thus (.75), the 75th percentile, is such that the area under the graph of f(x) to the left of (.75) is .75. Figure 4.10 illustrates the definition. The (100p)th percentile of a continuous distribution Figure 4.10 24 Percentiles of a Continuous Distribution Definition The median of a continuous distribution, denoted by , is the 50th percentile, so satisfies .5 = F( ) That is, half the area under the density curve is to the left of and half is to the right of . A continuous distribution whose pdf is symmetric—the graph of the pdf to the left of some point is a mirror image of the graph to the right of that point—has median equal to the point of symmetry, since half the area under the curve lies to either side of this point. 25 Percentiles of a Continuous Distribution Figure 4.12 gives several examples. The error in a measurement of a physical quantity is often assumed to have a symmetric distribution. Medians of symmetric distributions Figure 4.12 26 Expected Values For a discrete random variable X, E(X) was obtained by summing x p(x)over possible X values. Here we replace summation by integration and the pmf by the pdf to get a continuous weighted average. Definition The expected or mean value of a continuous rvX with pdf f(x) is x = E(X) = x f(x) dy 27 Expected Values When the pdf f(x) specifies a model for the distribution of values in a numerical population, then is the population mean, which is the most frequently used measure of population location or center. Often we wish to compute the expected value of some function h(X) of the rv X. If we think of h(X) as a new rv Y, techniques from mathematical statistics can be used to derive the pdf of Y, and E(Y) can then be computed from the definition. 28 Expected Values Fortunately, as in the discrete case, there is an easier way to compute E[h(X)]. Proposition If X is a continuous rv with pdf f(x) and h(X) is any function of X, then E[h(X)] = h(X) = h(x) f (x) dx 29 Expected Values For h(X), a linear function, E[h(X)] = E(aX + b) = aE(X) + b. In the discrete case, the variance of X was defined as the expected squared deviation from and was calculated by summation. Here again integration replaces summation. Definition The variance of a continuous random variable X with pdf f(x) and mean value is = V(X) = (x – )2 f(x)dx = E[(X – )2] The standard deviation (SD) of X is X = 30 Expected Values The variance and standard deviation give quantitative measures of how much spread there is in the distribution or population of x values. Again is roughly the size of a typical deviation from . Computation of 2 is facilitated by using the same shortcut formula employed in the discrete case. Proposition V(X) = E(X2) – [E(X)]2 31 cont’d When h(X) = aX + b, the expected value and variance of h(X) satisfy the same properties as in the discrete case: E[h(X)] = a + b and V[h(X)] = a2 2. 32 The Normal Distribution The normal distribution is the most important one in all of probability and statistics. Many numerical populations have distributions that can be fit very closely by an appropriate normal curve. Examples include heights, weights, and other physical characteristics (the famous 1903 Biometrika article “On the Laws of Inheritance in Man” discussed many examples of this sort), measurement errors in scientific experiments, anthropometric measurements on fossils, reaction times in psychological experiments, measurements of intelligence and aptitude, scores on various tests, and numerous economic measures and indicators. 33 The Normal Distribution Definition A continuous rv X is said to have a normal distribution with parameters and (or and 2), where << and 0 < , if the pdf of X is f(x; , ) = <x< (4.3) Again e denotes the base of the natural logarithm system and equals approximately 2.71828, and represents the familiar mathematical constant with approximate value 3.14159. 34 The Normal Distribution The statement that X is normally distributed with parameters and 2 is often abbreviated X ~ N(, 2). Clearly f(x; , ) 0, but a somewhat complicated calculus argument must be used to verify that f(x; , ) dx = 1. It can be shown that E(X) = and V(X) = 2, so the parameters are the mean and the standard deviation of X. 35 The Normal Distribution Figure 4.13 presents graphs of f(x; , ) for several different (, ) pairs. Two different normal density curves Figure 4.13(a) Visualizing and for a normal distribution Figure 4.13(b) 36 The Normal Distribution Each density curve is symmetric about and bell-shaped, so the center of the bell (point of symmetry) is both the mean of the distribution and the median. The value of is the distance from to the inflection points of the curve (the points at which the curve changes from turning downward to turning upward). 37 The Normal Distribution Large values of yield graphs that are quite spread out about , whereas small values of yield graphs with a high peak above and most of the area under the graph quite close to . Thus a large implies that a value of X far from may well be observed, whereas such a value is quite unlikely when is small. 38 The Standard Normal Distribution The computation of P(a X b) when X is a normal rv with parameters and requires evaluating (4.4) None of the standard integration techniques can be used to accomplish this. Instead, for = 0 and = 1, Expression (4.4) has been calculated using numerical techniques and tabulated for certain values of a and b. This table can also be used to compute probabilities for any other values of and under consideration. 39 The Standard Normal Distribution Definition The normal distribution with parameter values = 0 and = 1 is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by Z. The pdf of Z is <z< The graph of f(z; 0, 1) is called the standard normal (or z) curve. Its inflection points are at 1 and –1. The cdf of Z is P(Z z) = which we will denote by 40 The Standard Normal Distribution The standard normal distribution almost never serves as a model for a naturally arising population. Instead, it is a reference distribution from which information about other normal distributions can be obtained. Appendix Table A.3 gives = P(Z z), the area under the standard normal density curve to the left of z, for z = –3.49, –3.48,..., 3.48, 3.49. 41 The Standard Normal Distribution Figure 4.14 illustrates the type of cumulative area (probability) tabulated in Table A.3. From this table, various other probabilities involving Z can be calculated. Standard normal cumulative areas tabulated in Appendix Table A.3 Figure 4.14 42 Percentiles of the Standard Normal Distribution For any p between 0 and 1, Appendix Table A.3 can be used to obtain the (100p)th percentile of the standard normal distribution. 43 The 99th percentile of the standard normal distribution is that value on the horizontal axis such that the area under the z curve to the left of the value is .9900. Appendix Table A.3 gives for fixed z the area under the standard normal curve to the left of z, whereas here we have the area and want the value of z. This is the “inverse” problem to P(Z z) = ? so the table is used in an inverse fashion: Find in the middle of the table .9900; the row and column in which it lies identify the 99th z percentile. 44 cont’d Here .9901 lies at the intersection of the row marked 2.3 and column marked .03, so the 99th percentile is (approximately) z = 2.33. (See Figure 4.17.) Finding the 99th percentile Figure 4.17 45 cont’d By symmetry, the first percentile is as far below 0 as the 99th is above 0, so equals –2.33 (1% lies below the first and also above the 99th). (See Figure 4.18.) The relationship between the 1st and 99th percentiles Figure 4.18 46 Percentiles of the Standard Normal Distribution In general, the (100p)th percentile is identified by the row and column of Appendix Table A.3 in which the entry p is found (e.g., the 67th percentile is obtained by finding .6700 in the body of the table, which gives z = .44). If p does not appear, the number closest to it is often used, although linear interpolation gives a more accurate answer. 47 Percentiles of the Standard Normal Distribution For example, to find the 95th percentile, we look for .9500 inside the table. Although .9500 does not appear, both .9495 and .9505 do, corresponding to z = 1.64 and 1.65, respectively. Since .9500 is halfway between the two probabilities that do appear, we will use 1.645 as the 95th percentile and –1.645 as the 5th percentile. 48 z Notation for z Critical Values In statistical inference, we will need the values on the horizontal z axis that capture certain small tail areas under the standard normal curve. Notation z will denote the value on the z axis for which of the area under the z curve lies to the right of z. (See Figure 4.19.) z notation Illustrated Figure 4.19 49 z Notation for z Critical Values Table 4.1 lists the most useful z percentiles and z values. Standard Normal Percentiles and Critical Values Table 4.1 50 Nonstandard Normal Distributions When X ~ N(, 2), probabilities involving X are computed by “standardizing.” The standardized variable is (X – )/. Subtracting shifts the mean from to zero, and then dividing by scales the variable so that the standard deviation is 1 rather than . Proposition If X has a normal distribution with mean and standard deviation , then 51 Nonstandard Normal Distributions has a standard normal distribution. Thus 52 Nonstandard Normal Distributions The key idea of the proposition is that by standardizing, any probability involving X can be expressed as a probability involving a standard normal rv Z, so that Appendix Table A.3 can be used. This is illustrated in Figure 4.21. Equality of nonstandard and standard normal curve areas Figure 4.21 53 Nonstandard Normal Distributions The proposition can be proved by writing the cdf of Z = (X – )/ as Using a result from calculus, this integral can be differentiated with respect to z to yield the desired pdf f(z; 0, 1). 54 Percentiles of an Arbitrary Normal Distribution The (100p)th percentile of a normal distribution with mean and standard deviation is easily related to the (100p)th percentile of the standard normal distribution. Proposition Another way of saying this is that if z is the desired percentile for the standard normal distribution, then the desired percentile for the normal (, ) distribution is z standard deviations from . 55 The Exponential Distributions The family of exponential distributions provides probability models that are very widely used in engineering and science disciplines. Definition X is said to have an exponential distribution with parameter ( > 0) if the pdf of X is (4.5) 56 The Exponential Distributions Some sources write the exponential pdf in the form so that = 1/ . The expected value of an exponentially distributed random variable X is , Obtaining this expected value necessitates doing an integration by parts. The variance of X can be computed using the fact that V(X) = E(X2) – [E(X)]2. The determination of E(X2) requires integrating by parts twice in succession. 57 The Exponential Distributions The results of these integrations are as follows: Both the mean and standard deviation of the exponential distribution equal 1/. Graphs of several exponential pdf’s are illustrated in Figure 4.26. Exponential density curves Figure 4.26 58 The Exponential Distributions The exponential pdf is easily integrated to obtain the cdf. 59 The Exponential Distributions The exponential distribution is frequently used as a model for the distribution of times between the occurrence of successive events, such as customers arriving at a service facility or calls coming in to a switchboard. 60 The Exponential Distributions Another important application of the exponential distribution is to model the distribution of component lifetime. A partial reason for the popularity of such applications is the “memoryless” property of the exponential distribution. Suppose component lifetime is exponentially distributed with parameter . 61 The Exponential Distributions After putting the component into service, we leave for a period of t0 hours and then return to find the component still working; what now is the probability that it lasts at least an additional t hours? In symbols, we wish P(X t + t0 | X t0). By the definition of conditional probability, 62 The Exponential Distributions But the event X t0 in the numerator is redundant, since both events can occur if X t + t0 and only if. Therefore, This conditional probability is identical to the original probability P(X t) that the component lasted t hours. 63 The Exponential Distributions Thus the distribution of additional lifetime is exactly the same as the original distribution of lifetime, so at each point in time the component shows no effect of wear. In other words, the distribution of remaining lifetime is independent of current age. 64 The Exponential Distributions Proposition Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter t (where , the rate of the event process, is the expected number of events occurring in 1 unit of time) and that numbers of occurrences in nonoverlapping intervals are independent of one another. Then the distribution of elapsed time between the occurrence of two successive events is exponential with parameter = . 65 The Exponential Distributions Although a complete proof is beyond the scope of the text, the result is easily verified for the time X1 until the first event occurs: P(X1 t) = 1 – P(X1 > t) = 1 – P [no events in (0, t)] which is exactly the cdf of the exponential distribution. 66 The Lognormal Distribution Definition A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(X) has a normal distribution. The resulting pdf of a lognormal rv when ln(X) is normally distributed with parameters and is 67 The Lognormal Distribution Be careful here; the parameters and are not the mean and standard deviation of X but of ln(X). It is common to refer to and as the location and the scale parameters, respectively. The mean and variance of X can be shown to be In Chapter 5, we will present a theoretical justification for this distribution in connection with the Central Limit Theorem, but as with other distributions, the lognormal can be used as a model even in the absence of such justification. 68 The Lognormal Distribution Figure 4.30 illustrates graphs of the lognormal pdf; although a normal curve is symmetric, a lognormal curve has a positive skew. Lognormal density curves Figure 4.30 69 The Lognormal Distribution Because ln(X) has a normal distribution, the cdf of X can be expressed in terms of the cdf (z) of a standard normal rv Z. F(x; , ) = P(X x) = P[ln(X) ln(x)] (4.13) 70 Probability Plots An investigator will often have obtained a numerical sample x1, x2,…, xn and wish to know whether it is plausible that it came from a population distribution of some particular type (e.g., from a normal distribution). For one thing, many formal procedures from statistical inference are based on the assumption that the population distribution is of a specified type. The use of such a procedure is inappropriate if the actual underlying probability distribution differs greatly from the assumed type. 71 Probability Plots For example, the article “Toothpaste Detergents: A Potential Source of Oral Soft Tissue Damage” (Intl. J. of Dental Hygiene, 2008: 193–198) contains the following statement: “Because the sample number for each experiment (replication) was limited to three wells per treatment type, the data were assumed to be normally distributed.” 72 Probability Plots As justification for this leap of faith, the authors wrote that “Descriptive statistics showed standard deviations that suggested a normal distribution to be highly likely.” Note: This argument is not very persuasive. Additionally, understanding the underlying distribution can sometimes give insight into the physical mechanisms involved in generating the data. An effective way to check a distributional assumption is to construct what is called a probability plot. 73 Probability Plots The essence of such a plot is that if the distribution on which the plot is based is correct, the points in the plot should fall close to a straight line. If the actual distribution is quite different from the one used to construct the plot, the points will likely depart substantially from a linear pattern. 74 Sample Percentiles The details involved in constructing probability plots differ a bit from source to source. The basis for our construction is a comparison between percentiles of the sample data and the corresponding percentiles of the distribution under consideration. We know that the (100p)th percentile of a continuous distribution with cdf F() is the number (p) that satisfies F((p)) = p. That is, (p) is the number on the measurement scale such that the area under the density curve to the left of (p) is p. 75 Sample Percentiles This leads to the following general definition of sample percentiles. Definition Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i – .5)/n]th sample percentile. Once the percentage values 100(i – .5)/n(i = 1, 2,…, n) have been calculated, sample percentiles corresponding to intermediate percentages can be obtained by linear interpolation. 76 Sample Percentiles For example, if n = 10, the percentages corresponding to the ordered sample observations are 100(1 – .5)/10 = 5%, 100(2 – .5)/10 = 15%, 25%,…, and 100(10 – .5)/10 = 95%. The 10th percentile is then halfway between the 5th percentile (smallest sample observation) and the 15th percentile (second-smallest observation). For our purposes, such interpolation is not necessary because a probability plot will be based only on the percentages 100(i – .5)/n corresponding to the n sample observations. 77 A Probability Plot Suppose now that for percentages 100(i – .5)/n(i = 1,…, n) the percentiles are determined for a specified population distribution whose plausibility is being investigated. If the sample was actually selected from the specified distribution, the sample percentiles (ordered sample observations) should be reasonably close to the corresponding population distribution percentiles. 78 A Probability Plot That is, for i = 1, 2,…, n there should be reasonable agreement between the ith smallest sample observation and the [100(i – .5)/n]th percentile for the specified distribution. Let’s consider the (population percentile, sample percentile) pairs—that is, the pairs for i = 1,…, n. Each such pair can be plotted as a point on a two-dimensional coordinate system. 79