SDS, CUHKSZ STA2003 Probability STA2003 Probability Chapter 5: Continuous Distributions 5.1 Distribution of the Continuous Type Definition 5.1. A random variable X is said to be of the (absolute) continuous type if its distribution function F (x) = Pr(X ≤ x) has the form x −∞ < x < ∞, f (t)dt, F (x) = −∞ for some function f : R → [0, ∞). If a random variable X is of continuous type, then its probabilistic behaviour is no longer described by pmf defined by p(x) = Pr(X = x). Instead, the function f , called the probability density function (pdf ), will be used. ◀ 5.1.1 Probability Density Function of Continuous Random Variable A continuous variable of a group of individuals can be represented by a histogram as shown below. Relative Frequency Density Monthly salary of 5000 workers under 25 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 2 6 10 14 18 22 26 30 34 38 Salary (in thousands of HK dollars) In a histogram, the area of each rectangular block should be directly proportional to the frequency of the corresponding class. Usually, the height of each block is set to the following relative frequency density so that the total area of all the blocks will be equal to 1. relative frequency density = Chapter 5: Continuous Distributions relative frequency . class width ∼ Page 81 ∼ SDS, CUHKSZ STA2003 Probability For a large data set, one can use smaller class width to produce finer histogram. Relative Frequency Density Monthly salary of 5000 workers under 25 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0 10 20 30 40 Salary (in thousands of HK dollars) Relative Frequency Density Monthly salary of 5000 workers under 25 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0 10 20 30 40 Salary (in thousands of HK dollars) Therefore for an infinite population, it is reasonable to model the distribution of a continuous variable by a smooth curve, that is, the probability density function. Relative Frequency Density Monthly salary of 5000 workers under 25 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 10 20 30 40 Salary (in thousands of HK dollars) Chapter 5: Continuous Distributions ∼ Page 82 ∼ SDS, CUHKSZ STA2003 Probability Properties of a pdf 1. If F is differentiable, then f (x) = lim t→0 Pr(X ≤ x + t) − Pr(X ≤ x) d = F (x). t dx 2. f (x) ≥ 0 for all x. 3. ∞ −∞ f (x)dx = F (∞) = 1. 4. Pr(X ∈ A) = A f (x)dx where A is any subset of R. 5. If X is of continuous type, then (a) Pr(X = a) = a a f (x)dx = 0 for all a; and hence (b) Pr(a ≤ X ≤ b) = Pr(a < X ≤ b) = Pr(a ≤ X < b) = Pr(a < X < b) = b a f (x)dx = F (b) − F (a). (c) The distribution function F is continuous everywhere. Example 5.1. Consider the following probability density function. ( c(2x − x2 ), for 0 < x < 2; f (x) = 0, otherwise. The constant c can be solved by 2 2 x3 =⇒ c x − =1 3 0 2 c(2x − x )dx = 1 0 2 =⇒ 4c =1 3 3 =⇒ c = . 4 f (x) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 x −1 Chapter 5: Continuous Distributions 0 1 ∼ Page 83 ∼ 2 3 SDS, CUHKSZ STA2003 Probability For any a, b ∈ (0, 2) where a < b, b Pr(a ≤ X ≤ b) = 3 1 2 (2x − x2 )dx = 3(b − a2 ) − (b3 − a3 ) . 4 a 4 Distribution function: x F (x) = 0 3 1 (2t − t2 )dt = (3x2 − x3 ), 4 4 F (x) = 0 for x ≤ 0; F (x) = 1 for 0 < x < 2. for x ≥ 2. Pr(0.5 ≤ X ≤ 1) = F (1) − F (0.5) = 0.5 − 0.15625 = 0.34375. ⋆ Remarks Note that f (x) is not a probability. For a very small ε > 0, ε ε Pr a − ≤ X ≤ a + = 2 2 a+ ε 2 f (x)dx ≈ εf (a). a− 2ε Hence, f (a) is a measure of how likely it is that the random variable will be near a. 5.1.2 Mean, Variance and Moment Generating Function Definition 5.2. If f (x) is the pdf of a continuous random variable, then ∞ E[g(X)] = g(x)f (x)dx −∞ is the mathematical expectation (expected value) of g(X) if it exists. ◀ The properties of the mathematical expectation of a discrete random variable also apply to the mathematical expectation of a continuous random variable. Chapter 5: Continuous Distributions ∼ Page 84 ∼ SDS, CUHKSZ STA2003 Probability Properties 1. If c is a constant, then E(c) = c. 2. If c is a constant, then E[cg(X)] = cE[g(X)]. " n # n X X 3. If c1 , c2 , . . . , cn are constants, then E ci gi (X) = ci E [gi (X)]. i=1 i=1 4. X(ω) ≥ Y (ω) for all ω ∈ Ω =⇒ E(X) ≥ E(Y ). 5. E(|X|) ≥ |E(X)|. ∞ Mean: xf (x)dx. µ = E(X) = −∞ ∞ Variance: σ = Var(X) = E (X − µ)2 = 2 (x − µ)2 f (x)dx = E(X 2 ) − µ2 . −∞ Moment generating function: ∞ tX etx f (x)dx, MX (t) = E(e ) = −∞ (r) E(X r ) = MX (0). Cumulant generating function: RX (t) = ln MX (t), ′ µ = RX (0), ′′ σ 2 = RX (0). Example 5.2. Consider the following probability density function. ( xe−x , for 0 < x < ∞; f (x) = 0, otherwise. The mgf is ∞ ∞ tx MX (t) = −x xe−(1−t)x dx = e xe dx = 0 0 1 , (1 − t)2 t < 1. The cgf is RX (t) = ln MX (t) = −2 ln(1 − t). And the mean and the variance are µ = R′ (0) = 2 and σ 2 = R′′ (0) = 2. ⋆ Chapter 5: Continuous Distributions ∼ Page 85 ∼ SDS, CUHKSZ STA2003 Probability 5.2 Common Continuous Distributions 5.2.1 Uniform Distribution Definition 5.3. For an interval (a, b), let X be the point randomly drawn from this interval. If the pdf of X is a constant function on (a, b), i.e., 1 , for a < x < b; f (x) = b − a 0, otherwise, then X is said to have a uniform distribution and is denoted as X ∼ U(a, b). ◀ f (x) F (x) 1 1 b−a 0 x a 0 b a x b Roughly speaking, it is a point randomly selected in such a way that it has no preference to be located near any particular region. Distribution function: 0, for x ≤ a; x − a , for a < x < b; F (x) = b−a 1, for x ≥ b. Mean: µ = E(X) = a+b 2 (midpoint of the interval). Variance: σ 2 = Var(X) = E(X 2 ) − µ2 = Chapter 5: Continuous Distributions ∼ Page 86 ∼ (b − a)2 . 12 SDS, CUHKSZ STA2003 Probability Example 5.3. A straight rod drops freely onto a horizontal plane. Let X be the angle between the rod and North direction: 0 ≤ X ≤ 2π. Then, X ∼ U(0, 2π). (2π − 0)2 π2 0 + 2π = π, σ2 = = . 2 12 3 x−0 x F (x) = = for 0 ≤ x < 2π. 2π − 0 2π µ= π π Pr(pointing towards direction between NE and E) = Pr ≤X≤ π2 π4 −F = F 2 4 1 π π = − 2π 2 4 1 . = 8 ⋆ Property Let X ∼ U(0, 1) and Y = cX + d, then Y ∼ U(d, c + d), if c is positive; Y ∼ U(c + d, d), if c is negative. Proof. Consider FX (x) = Pr(X ≤ x) = x−0 =x 1−0 for 0 ≤ x ≤ 1. If c > 0, then the cdf of Y is given by FY (y) = Pr(Y ≤ y) = Pr(cX + d ≤ y) y−d y−d y−d = Pr X ≤ = = c c (c + d) − d for d ≤ y ≤ c + d. Compare with the cdf of a uniform random variable, we have Y ∼ U(d, c + d). Similarly for c < 0, then Y ∼ U(c + d, d). Example 5.4. If X ∼ U(0, 1), then 2X ∼ U(0, 2) and 2X − 1 ∼ U(−1, 1). X +2 6−X ∼ U(0, 1). Also, 6 − X ∼ U(0, 8) and ∼ U(0, 1). If X ∼ U(−2, 6), then 8 8 ⋆ Chapter 5: Continuous Distributions ∼ Page 87 ∼ SDS, CUHKSZ STA2003 Probability 5.2.2 Exponential Distribution Definition 5.4. Let X be a positive random variable with pdf ( λe−λx , for x > 0; f (x) = 0, for x ≤ 0, then X is said to have an exponential distribution and is denoted as X ∼ Exp(λ). ◀ Probability Density Functions of Exp(λ) f (x) λ=2 λ=1 λ = 0.5 2 1 x 0 1 2 3 4 Recall in Poisson process, Random variable N (t) = number of occurrences in a time interval [0, t] p(y) = Pr(N (t) = y) = e−λt (λt)y , y! y = 0, 1, 2, . . . . Define random variable X be the waiting time until the first occurrence in a Poisson process with rate λ. Then the distribution function of X can be derived as follows: For x ≤ 0, F (x) = Pr(X ≤ x) = 0. For x > 0, F (x) = Pr(X ≤ x) = 1 − Pr(X > x) = 1 − Pr(no occurrence in time interval [0, x]) = 1 − Pr(N (x) = 0) = 1 − e−λx (Set y = 0 in p(y).) Chapter 5: Continuous Distributions ∼ Page 88 ∼ SDS, CUHKSZ STA2003 Probability Therefore, the cdf of X is ( 1 − e−λx , for x > 0; F (x) = 0, for x ≤ 0. Hence, the pdf of X is given by f (x) = F ′ (x) = λe−λx , for x > 0. Therefore X is distributed as exponential with parameter λ. An exponential random variable therefore can describe the random time elapsing between unpredictable events (e.g., telephone calls, earthquakes, arrivals of buses or customers, etc.) The moment generating function of Exp(λ) is given by ∞ ∞ tx −λx e λe MX (t) = −(λ−t)x λe dx = 0 0 ∞ λe−(λ−t)x λ dx = − = , λ−t 0 λ−t t < λ. The cumulant generating function is then RX (t) = ln MX (t) = ln λ − ln(λ − t). R′ (t) = 1 , λ−t µ = R′ (0) = 1 , λ R′′ (t) = 1 , (λ − t)2 σ 2 = R′′ (0) = 1 . λ2 In summary, we have for X ∼ Exp(λ), Distribution function: ( 1 − e−λx , for x > 0; F (x) = 0, for x ≤ 0. Moment generating function: MX (t) = λ , λ−t 1 λ and Mean and variance: µ= t < λ. σ2 = 1 . λ2 Example 5.5. Let X be the failure time of a machine. Assume that X follows an exponential distribution with rate 1 λ = 2 failures per month. Then the average failure time is E(X) = month. The probability that 2 the machine can run over 4 months without any failure is given by Pr(X > 4) = 1 − F (4) = 1 − 1 − e−2(4) = e−8 = 0.000335. ⋆ Chapter 5: Continuous Distributions ∼ Page 89 ∼ SDS, CUHKSZ STA2003 Probability Remarks For any a > 0 and b > 0, Pr(X > a + b) = = = = = 1 − F (a + b) e−λ(a+b) e−λa e−λb (1 − F (a)) (1 − F (b)) Pr(X > a) Pr(X > b). This implies Pr(X > a + b|X > a) = Pr(X > b). That is, knowing that event has not occurred in the past a units of time does not alter the distribution of the arrival time in the future, i.e., we may assume the process starts afresh at any point of observation. Among all continuous random variable with support (0, ∞), the exponential distribution is the only distribution that has the memoryless property. Example 5.6. In the previous example (Example 5.5.), what is the probability that the machine can run 4 months more given that it has run for 1 year already? Solution: Pr(X > 16|X > 12) = Pr(X > 4) = e−8 = 0.000335. ⋆ Remarks Another commonly used parametrization of exponential distribution is to define the pdf as 1 e− βx , for x > 0; f (x) = β 0, for x ≤ 0, where β > 0 is mean, standard deviation, and scale parameter of the distribution, the reciprocal of the rate parameter, λ, defined earlier. In this parametrization, β is a survival parameter that if a random variable X is considered the survival time of some systems or lives, the expected duration of survival is β units of time. The previous parametrization involving the rate parameter λ arises in the context of events arriving at a rate λ, when the time between events (which might be modeled using an exponential distribution) has a mean of β = λ−1 . The alternative specification with β is sometimes more convenient than the one with λ, and some authors will use it as a standard definition. Chapter 5: Continuous Distributions ∼ Page 90 ∼ SDS, CUHKSZ STA2003 Probability 5.2.3 Gamma and Chi-Squared Distribution Definition 5.5. The gamma function is defined by ∞ xα−1 e−x dx, Γ(α) = for α > 0. 0 It is also called the Euler integral of the second kind. ◀ Properties of the gamma function 1. Γ(1) = 1. 2. For α > 1, Γ(α) = (α − 1)Γ(α − 1). 3. For any integer n ≥ 1, Γ(n) = (n − 1)!. √ 1 4. Γ = π. 2 Definition 5.6. Let X be a positive random variable with pdf α λ xα−1 e−λx , for x > 0; f (x) = Γ(α) 0, for x ≤ 0, where α > 0 and λ > 0, then X is said to have a gamma distribution and is denoted as X ∼ Γ(α, λ), or X ∼ Gam(α, λ). ◀ Chapter 5: Continuous Distributions ∼ Page 91 ∼ SDS, CUHKSZ STA2003 Probability Probability Density Functions of Γ(α = 4, λ) f (x) λ=1 λ = 0.5 λ = 0.25 0.2 0.1 x 0 5 10 15 20 25 30 35 Gamma random variable as a waiting time Let Tn be the waiting time until the n-th occurrence of an event according to a Poisson process with rate λ. Then the distribution function of Tn can be derived as F (t) = Pr(Tn ≤ t) = 1 − Pr(Tn > t) = 1 − Pr(N (t) < n) n−1 X (λt)k e−λt , t > 0. = 1− k! k=0 Hence, the pdf of Tn is f (t) = F ′ (t) = λn n−1 −λt λ(λt)n−1 e−λt = t e , (n − 1)! Γ(n) t > 0. Therefore Tn is distributed as gamma with parameters α = n and rate λ. A gamma random variable therefore can describe the random time elapsing until the accumulation of a specific number of unpredictable events (e.g., telephone calls, earthquakes, arrivals of buses or customers, etc.) The moment generating function of X ∼ Γ(α, λ) is given by ∞ ∞ λα α−1 −λx λα x e dx = xα−1 e−(λ−t)x dx MX (t) = e Γ(α) Γ(α) 0 0 ∞ λα (λ − t)α α−1 −(λ−t)x = x e dx (λ − t)α 0 Γ(α) α λ = , t < λ. λ−t tx From this mgf, one can easily derive the mean and variance of X. Chapter 5: Continuous Distributions ∼ Page 92 ∼ SDS, CUHKSZ STA2003 Probability In summary, we have for X ∼ Γ(α, λ), x Distribution function: F (x) = 0 λα α−1 −λt t e dt, Γ(α) for x > 0. Moment generating function: MX (t) = λ λ−t α λ and Mean and variance: µ= α , t < λ. σ2 = α . λ2 For a Poisson process, the mean waiting time until the n-th occurrence is E(Tn ) = n . λ Example 5.7. Assume that the number of phone calls received by a customer service representative follows a Poisson process with rate 20 calls per hour. Let T8 be the waiting time (in hours) until the 8th phone call. Then T8 ∼ Γ(8, 20). 2 8 = hour. Mean waiting time for 8 phone calls is E(T8 ) = 20 5 Suppose that a customer service representative only needs to serve 8 calls before taking a break. Then the probability that he/she has to work for more than 48 minutes (i.e., 0.8 hour) before taking a rest is Pr(T8 > 0.8) = 1 − Pr(T8 ≤ 0.8) 0.8 = 1− 0 208 7 −20x xe dx = 1 − Γ(8) 1− 7 X 16k e−16 k=0 ! = 0.01. k! ⋆ Properties of gamma distribution 1. The exponential distribution is a special case of the gamma distribution with α = 1, i.e., Γ(1, λ) ≡ Exp(λ). 2. Suppose X ∼ Γ(α, λ). Let Y = bX where b is a positive number. Then, λ . Y ∼ Γ α, b Proof. The moment generating function of Y is given by α λ tY tbX MY (t) = E(e ) = E(e ) = MX (bt) = = λ − bt λ which is the moment generating function of Γ α, . b Chapter 5: Continuous Distributions ∼ Page 93 ∼ λ b λ −t b !α , t< λ , b SDS, CUHKSZ STA2003 Probability 3. The parameter α is called the shape parameter while λ is called the rate parameter. According to property 2, we may tabulate the distribution function for the gamma distribution with a standardized scale parameter. 4. Similar to exponential distribution, there is another parametrization of gamma distribution. Consider a shape parameter of k > 0 and a scale parameter of θ > 0, the corresponding pdf is 1 xk−1 e− xθ , for x > 0; f (x) = Γ(k)θk 0, for x ≤ 0. Essentially, k = α and θ = λ−1 . Definition 5.7. r 1 Let r be a positive integer. If X ∼ Γ(α, λ) with α = , λ = , then we say that X has a 2 2 chi-squared distribution with degrees of freedom r and is denoted by X ∼ χ2 (r) or X ∼ χ2r . Probability density function: f (x) = 1 r Γ 2 r 2 r 2 x x 2 −1 e− 2 , x > 0. Moment generating function: MX (t) = 1 r , (1 − 2t) 2 µ=r and 1 t< . 2 Mean and variance: σ 2 = 2r. ◀ Probability Density Functions of χ2 (r) f0.6 (x) r=1 r=2 r=3 r=4 r=6 r=9 0.5 0.4 0.3 0.2 0.1 x 0 1 2 3 4 5 6 7 8 The values of the chi-squared distribution function are tabulated. (on Page 107) Chapter 5: Continuous Distributions ∼ Page 94 ∼ SDS, CUHKSZ STA2003 Probability Example 5.8. 1 For Example 5.7., T8 ∼ Γ(8, 20). Consider Y = 40T8 ∼ Γ 8, 2 ≡ χ2 (16). From the chi-squared distribution table, Pr(T8 > 0.8) = 1 − Pr(T8 ≤ 0.8) = 1 − Pr(Y ≤ 32) = 1 − 0.99 = 0.01. ⋆ Chapter 5: Continuous Distributions ∼ Page 95 ∼ SDS, CUHKSZ STA2003 Probability 5.2.4 Normal Distribution Many naturally occurring variables have distributions that are well-approximated by a “bell-shaped curve”, or a normal distribution. These variables have histograms which are approximately symmetric, have a single mode at the center, and tail away to both sides. Two parameters, the mean µ and the standard deviation σ describe a normal distribution completely and allow one to approximate the proportions of observations in any interval by finding corresponding areas under the appropriate normal curve. Definition 5.8. A random variable X is said to have a normal distribution (Gaussian distribution) if its pdf is defined by (x−µ)2 1 −∞ < x < ∞, f (x) = √ e− 2σ2 , 2πσ 2 where −∞ < µ < ∞ and σ 2 > 0 are the location parameter and scale parameter respectively. It is denoted as X ∼ N(µ, σ 2 ). Distribution function: x (t−µ)2 1 √ e− 2σ2 dt, 2πσ 2 −∞ −∞ < x < ∞. 1 22 MX (t) = exp µt + σ t , 2 t ∈ R. F (x) = Moment generating function: Mean and variance: E(X) = µ and Var(X) = σ 2 . ◀ Characteristics of all normal curves 1. Each bell-shaped normal curve is symmetric and centered at its mean µ. 2. About 68% of the area is within one standard deviation of the mean, about 95% of the area is within two standard deviations of the mean, and almost all (99.7%) of the area is within three standard deviations of the mean. 3. The places where the normal curve is the steepest are one standard deviation below and above the mean (µ − σ and µ + σ). Chapter 5: Continuous Distributions ∼ Page 96 ∼ SDS, CUHKSZ STA2003 Probability Probability Density Functions of N(µ, σ 2 ) f (x) µ = 0, σ 2 = 1 µ = 0, σ 2 = 3 µ = 0, σ 2 = 2 µ = 4, σ 2 = 2 0.4 0.3 0.2 0.1 x −10 −8 −6 −4 −2 0 2 4 6 8 10 The standard normal distribution Areas under all normal curves are related. For example, the area to the right of 1.76 standard deviations above the mean is identical for all normal curves. Because of this, we can find an area over an interval for any normal curve by finding the corresponding area under a standard normal curve which has mean µ = 0 and standard deviation σ = 1. If µ = 0 and σ 2 = 1, then Z ∼ N(0, 1) is said to have the standard normal distribution. Standard Normal Density 0.4 ϕ(x) 0.3 N(0, 1) 0.2 0.1 0.0 −3 Chapter 5: Continuous Distributions −2 −1 0 x ∼ Page 97 ∼ 1 2 3 SDS, CUHKSZ STA2003 Probability Usually the probability density function and distribution function of the standard normal distribution are denoted as z2 1 ϕ(z) = √ e− 2 , 2π Φ(z) = Pr(Z ≤ z) = −∞ < z < ∞, z t2 1 √ e− 2 dt, 2π −∞ −∞ < z < ∞. The values of the standard normal distribution function are tabulated. The standard normal distribution table (on Page 108) The standard normal table tells the area under a standard normal curve to the left of a number z, i.e., the value of Φ(z). The number z is rounded to two decimal places in the table. The ones place and the tenths place are listed at the left side of the table. The hundredths place is listed across the top. Areas are listed in the table entries. The standard normal table here does not have information for negative z values. Because of symmetry, this is not necessary, one may consider Φ(−z) = 1 − Φ(z). Remarks There are other two different versions of the standard normal distribution tables, one is “Area under the standard normal curve”, another is “The standard normal right-tail probabilities”. Google them! Chapter 5: Continuous Distributions ∼ Page 98 ∼ SDS, CUHKSZ STA2003 Probability Properties 1. Normal distribution is symmetric about its mean. That is, if X ∼ N(µ, σ 2 ), then Pr(X ≤ µ − x) = Pr(X ≥ µ + x). In particular, Φ(x) = 1 − Φ(−x). 2. If X ∼ N(µ, σ 2 ), then aX + b ∼ N(aµ + b, a2 σ 2 ). In particular, the normal score Z = is distributed as standard normal. X −µ σ 3. If X ∼ N(0, 1), then X 2 ∼ χ2 (1). Standardization In working with normal curves, the first step in a calculation is invariably to standardize. Z= X −µ ∼ N(0, 1). σ This normal score (standard score, z-score) tells how many standard deviations an observation X is from its mean. Positive z-score indicates that X is greater than the mean; negative z-score indicates that X is below the mean. If the z-score is known and the value of X is needed, solving the previous equation of X gives X = µ + Z × σ. In words, this states that X is Z standard deviations above the mean. Example 5.9. Studies show that gasoline use for compact cars sold in the United States is normally distributed, with a mean use of 30.5 miles per gallon (mpg) and a standard deviation of 4.5 mpg. What percentage of compact cars use between 25 and 40 mpg? Solution: Let X mpg be the gasoline used for a compact car. Then, X ∼ N(30.5, 4.52 ). 25 − 30.5 X − 30.5 40 − 30.5 ≤ ≤ Pr(25 ≤ X ≤ 40) = Pr 4.5 4.5 4.5 ≈ Pr(−1.22 ≤ Z ≤ 2.11), Z ∼ N(0, 1) = Φ(2.11) − [1 − Φ(1.22)] = 0.9826 − (1 − 0.8888) = 0.8714 = 87.14%. ⋆ 2 In general, if X ∼ N(µ, σ ), then Pr(a ≤ X ≤ b) = Φ Chapter 5: Continuous Distributions ∼ Page 99 ∼ b−µ σ −Φ a−µ . σ SDS, CUHKSZ STA2003 Probability Example 5.10. In general, for a normal random variable X ∼ N(µ, σ 2 ), the probability of that its value is within one standard deviation from the mean is Pr(µ − σ ≤ X ≤ µ + σ) = Φ(1) − Φ(−1) = 0.682 = 68.2%. Similarly, the probability of that its value is within two standard deviations from the mean is Pr(µ − 2σ ≤ X ≤ µ + 2σ) = Φ(2) − Φ(−2) = 0.954 = 95.4%, and the probability of that its value is within three standard deviations from the mean is Pr(µ − 3σ ≤ X ≤ µ + 3σ) = Φ(3) − Φ(−3) = 0.997 = 99.7%. ⋆ Example 5.11. A manufacturer needs washers between 0.1180 and 0.1220 inches thick; any thickness outside this range is unusable. One machine shop sells washers as $30 per 100. Their thickness is normally distributed with a mean of 0.1200 inch and a standard deviation of 0.0010 inch. A second machine shop sells washers at $26 per 100. Their thickness is normally distributed with a mean of 0.1200 inch and a standard deviation of 0.0015 inch. Which shop offers the better deal? Solution: X ∼ N(0.12, 0.0012 ) 0.122 − 0.12 0.118 − 0.12 Pr(0.118 ≤ X ≤ 0.122) = Φ −Φ 0.001 0.001 = Φ(2) − Φ(−2) = 0.9772 − (1 − 0.9772) = 0.9544. Washers purchased from shop 1: Hence on average, 95.44 of 100 washers from shop 1 are usable. Average cost of each effective washer is $30/95.44 = $0.3143. Y ∼ N(0.12, 0.00152 ) 0.118 − 0.12 0.122 − 0.12 Pr(0.118 ≤ Y ≤ 0.122) = Φ −Φ 0.0015 0.0015 ≈ Φ(1.33) − Φ(−1.33) = 0.9082 − (1 − 0.9082) = 0.8164. Washers purchased from shop 2: Hence on average, 81.64 of 100 washers from shop 2 are usable. Average cost of each effective washer is $26/81.64 = $0.3185. We may conclude that washers from shop 1 are slightly better. ⋆ Chapter 5: Continuous Distributions ∼ Page 100 ∼ SDS, CUHKSZ STA2003 Probability Example 5.12. A brass polish manufacturer wishes to set his filling equipment so that in the long run only five cans in 1,000 will contain less than a desired minimum net fill of 800 gm. It is known from experience that the filled weights are approximately normally distributed with a standard deviation of 6 gm. At what level will the mean fill have to be set in order to meet this requirement? Solution: Let X gm be the net fill of a particular can. Then X ∼ N(µ, 36). The requirement can be expressed as the following probability statement: X −µ 800 − µ Pr(X < 800) = 0.005 =⇒ Pr < = 0.005 σ 6 800 − µ =⇒ Φ = 0.005 6 800 − µ = 1 − 0.005 = 0.995 =⇒ Φ − 6 800 − µ =⇒ − = 2.575 6 =⇒ µ = 815.45. ⋆ Chapter 5: Continuous Distributions ∼ Page 101 ∼ SDS, CUHKSZ STA2003 Probability Example 5.13. For the normal probability calculation in Example 5.9., to use the above table to find the area between −1.22 and 2.11, we can add up the two areas shown in the following graph: Standard Normal Density 0.4 0.3 N(0, 1) 0.2 A2 A1 0.1 0.0 −3 −2 −1 0 −1.22 1 2 3 2.11 From the table, the area A1 is obtained as 0.4826. Based on the symmetric property of normal distribution, the area A2 is equal to the area between zero and 1.22, which can be obtained directly from the table as 0.3888. Therefore, the total area between −1.22 and 2.11 is 0.4826 + 0.3888 = 87.14%. ⋆ Chapter 5: Continuous Distributions ∼ Page 102 ∼ SDS, CUHKSZ STA2003 Probability Example 5.14. Refer to Example 5.9.. In times of scarce energy resources, a competitive advantage is given to an automobile manufacturer who can produce a car obtaining substantially better fuel economy than the competitors’ cars. If a manufacturer wishes to develop a compact car that outperforms 95% of the current compacts in fuel economy, what must the gasoline use rate for the new car be? The gasoline use X (mpg) is normally distributed: X ∼ N(30.5, 4.52 ). According to the manufacturer’s requirement, we want to find the value c such that Pr(X ≤ c) = 0.95. (In fact, c is said to be the 95th percentile of the distribution of X). Using the standard normal distribution table (of the version with areas between zero and z), we need to find the table entry as close as possible to 0.95 − 0.5 = 0.45. We observe that the table entry is 0.4495 for z = 1.64 and is 0.4505 for z = 1.65. Therefore, the required value of the z-score is halfway between 1.64 and 1.65, i.e., z = 1.645. The required value of c is thus given by c = 30.5 + 1.645 × 4.5 = 37.9 mpg. The manufacturer’s new compact car must therefore obtain a fuel economy of 37.9 mpg to outperform 95% of the compact cars currently available on the U.S. market. ⋆ 5.2.4.1 Normal Approximation to Binomial Probability One useful application of normal distribution is to approximate the probabilities of other distributions, in particular, binomial distribution. The theory behind is indeed an application of the central limit theorem (CLT ) which is introduced in Chapter 11 and is a topic in STAT2602 Probability and Statistics II. Here we only focus on the application of normal approximation to binomial probability, which is common in practice. (Recall from Chapter 4 that binomial distribution can also be approximated by Poisson distribution.) For a random variable X following a binomial distribution B(n, p), it is known that E(X) = np and Var(X) = np(1 − p). If n is sufficiently large (a classical general guideline is to require np ≥ 5 and n(1 − p) ≥ 5 while some stricter sources request 9 rather than 5), then X can be approximated by the normal distribution N(np, np(1 − p)). The approximation works especially well if n is large and p is close to 0.5, because in this case the binomial distribution is more symmetric like a bell-shaped normal distribution. Continuity Correction Since binomial distribution is a discrete distribution while normal distribution is a continuous distribution, the approximation should be made more accurate by making an adjustment of values. This is called continuity correction. For example, let X be a discrete binomial random variable and Y be a continuous normal random variable to approximate X. Suppose Pr(X = a) is to be calculated where a is an integer, it is clear that using Pr(Y = a) for approximation does not work because it must be zero. Therefore, it is more appropriate to consider Pr(X = a) = Pr(a − 0.5 < X < a + 0.5) ≈ Pr(a − 0.5 < Y < a + 0.5). Chapter 5: Continuous Distributions ∼ Page 103 ∼ SDS, CUHKSZ STA2003 Probability Example 5.15. 1 Recall Example 4.22., X ∼ B 8000, 1000 can be approximated by Po(8) in calculating Pr(X < 8). Here we consider a normal approximation by first checking that 8000 × 1 =8≥5 1000 and 8000 × 999 = 7992 ≥ 5. 1000 Note that 1 = 8, 1000 1 999 Var(X) = 8000 × × = 7.992. 1000 1000 The binomial random variable X can be approximated by Y ∼ N(8, 7.992). E(X) = 8000 × Therefore, Pr(X < 8) = Pr(X ≤ 7) ≈ Pr(Y ≤ 7.5) 7.5 − 8 = Pr Z ≤ √ where Z ∼ N(0, 1) 7.992 = Pr(Z ≤ −0.17686515) ≈ 0.429807157. Note that the previous Poisson approximation gives 0.452960809 while the accurate binomial probability is 0.452890977. We see that in this case normal approximation does not perform as accurate as Poisson approximation 1 is quite far from 0.5. because p = 1000 ⋆ 5.2.5 Beta Distribution Definition 5.9. Let X be a positive random variable with pdf Γ(α + β) xα−1 (1 − x)β−1 , for 0 < x < 1; f (x) = Γ(α)Γ(β) 0, otherwise, where α and β are positive numbers, then X is said to have a beta distribution and is denoted as X ∼ Beta(α, β). Mean and variance: µ= α α+β and σ2 = αβ . (α + β)2 (α + β + 1) ◀ Chapter 5: Continuous Distributions ∼ Page 104 ∼ SDS, CUHKSZ STA2003 Probability It is commonly used for modelling a random quantity distributed within a finite interval. Probability Density Functions of Beta(α, β) f (x) α = 0.5, β = 0.5 α = 5, β = 1 α = 1, β = 3 α = 2, β = 2 α = 2, β = 5 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x 1.1 1 Remarks 1. The function 1 xα−1 (1 − x)β−1 dx = B(α, β) = 0 Γ(α)Γ(β) Γ(α + β) is called the beta function, it is also the Euler integral of the first kind. Therefore the pdf of a beta distribution is sometimes expressed as 1 xα−1 (1 − x)β−1 , for 0 < x < 1; f (x) = B(α, β) 0, otherwise. 2. If α = β = 1, the beta distribution becomes the uniform distribution U(0, 1). Example 5.16. After assessing the current political, social, economical and financial factors, a financial analyst believes that the proportion of the stocks that will increase in value tomorrow is a beta random variable with α = 5 and β = 3. What is the expected value of this proportion? How likely is that the values of at least 70% of the stocks will move up? Solution: Let P be the proportion. Then, P ∼ Beta(5, 3) and E(P ) = 1 Γ(8) 7! Pr(P ≥ 0.7) = x5−1 (1 − x)3−1 dx = 4!2! 0.7 Γ(5)Γ(3) 5 5 = = 0.625. 5+3 8 1 (x4 − 2x5 + x6 )dx = 0.3529. 0.7 ⋆ Chapter 5: Continuous Distributions ∼ Page 105 ∼ SDS, CUHKSZ STA2003 Probability 5.3 Reading of textbook Chapter 5: sections 5.1, 5.2, 5.3, 5.5 Chapter 8: section 8.4 Chapter 5: section 5.4 Acknowledgment: We thank Dr. Kam Pui WAT for his tremendous contribution to these lecture notes. ∼ End of Chapter 5 ∼ Chapter 5: Continuous Distributions ∼ Page 106 ∼ SDS, CUHKSZ STA2003 Probability Chapter 5: Continuous Distributions ∼ Page 107 ∼ SDS, CUHKSZ STA2003 Probability Table Va The Standard Normal Distribution Function f(z) 0.4 0.3 Φ(z0) 0.2 0.1 −3 −2 −1 z 1 z0 0 2 3 z 1 2 √ e−w /2 dw 2π (−z) = 1 − (z) P(Z ≤ z) = (z) = −∞ z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7703 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 α 0.400 0.300 0.200 0.100 0.050 0.025 0.020 0.010 0.005 0.001 zα zα/2 0.253 0.842 0.524 1.036 0.842 1.282 1.282 1.645 1.645 1.960 1.960 2.240 2.054 2.326 2.326 2.576 2.576 2.807 3.090 3.291 Chapter 5: Continuous Distributions ∼ Page 108 ∼