Applied Statistics I Liang Zhang Department of Mathematics, University of Utah June 30, 2008 Liang Zhang (UofU) Applied Statistics I June 30, 2008 1 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 2 / 41 Normal Distribution Definition A continuous rv X is said to have a normal distribution with parameter µ and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is f (x; µ, σ) = √ 1 2 2 e −(x−µ) /(2σ ) 2πσ We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally distributed with parameters µ and σ 2 . Liang Zhang (UofU) Applied Statistics I June 30, 2008 2 / 41 Normal Distribution Definition A continuous rv X is said to have a normal distribution with parameter µ and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is f (x; µ, σ) = √ 1 2 2 e −(x−µ) /(2σ ) 2πσ We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally distributed with parameters µ and σ 2 . Remark: 1. Obviously, f (x) ≥ 0 for R ∞all x;1 −(y −µ)2 /(2σ2 ) 2. It is guaranteed that −∞ √2πσ e dy = 1. Liang Zhang (UofU) Applied Statistics I June 30, 2008 2 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 3 / 41 Normal Distribution Proposition For X ∼ N(µ, σ 2 ), we have E (X ) = µ and V (X ) = σ 2 Liang Zhang (UofU) Applied Statistics I June 30, 2008 3 / 41 Normal Distribution Proposition For X ∼ N(µ, σ 2 ), we have E (X ) = µ and V (X ) = σ 2 σ=1 Liang Zhang (UofU) σ=2 Applied Statistics I σ = 0.5 June 30, 2008 3 / 41 Liang Zhang (UofU) Applied Statistics I June 30, 2008 4 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 5 / 41 Normal Distribution The cdf of a normal random variable X is Z x F (x) = P(X ≤ x) = f (y ; µ, σ)dy −∞ Z x 1 2 2 √ e −(y −µ) /(2σ ) dy = 2πσ −∞ Z x−µ 1 2 2 =√ e −(z) /(2σ ) dz change of variable:z = y − µ 2πσ −∞ Z x−µ σ z 1 2 e −(w ) /2 · σdw change of variable:w = =√ σ 2πσ −∞ Z x−µ σ 1 2 √ e −(w ) /2 dw = 2π −∞ Liang Zhang (UofU) Applied Statistics I June 30, 2008 5 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 6 / 41 Normal Distribution Definition The normal distribution with parameter values µ = 0 and σ = 1 is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by Z . The pdf of Z is 1 2 f (z; 0, 1) = √ e −z /2 2π −∞<z <∞ The graph of f (z; 0, R1) is called the standard normal (or z) curve. The cdf z of Z is P(Z ≤ z) = −∞ f (y ; 0, 1)dy , which we will denote by Φ(z). Liang Zhang (UofU) Applied Statistics I June 30, 2008 6 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 7 / 41 Normal Distribution Shaded area = Φ(0.5) Liang Zhang (UofU) Applied Statistics I June 30, 2008 7 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 8 / 41 Normal Distribution Table A.3 z ··· -1.2 -1.1 ··· 1.6 1.7 ··· Standard Normal Curve Areas .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· Liang Zhang (UofU) .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· Applied Statistics I .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· June 30, 2008 8 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 41 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 41 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). z ··· -1.2 -1.1 ··· 1.6 1.7 ··· .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· Liang Zhang (UofU) .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· Applied Statistics I ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· June 30, 2008 9 / 41 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). z ··· -1.2 -1.1 ··· 1.6 1.7 ··· .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· P(Z ≤ 1.61) = 0.9463; Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 41 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). z ··· -1.2 -1.1 ··· 1.6 1.7 ··· .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· P(Z ≤ 1.61) = 0.9463; P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686; Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 41 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). z ··· -1.2 -1.1 ··· 1.6 1.7 ··· .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· P(Z ≤ 1.61) = 0.9463; P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686; P(−1.12 < Z ≤ 1.61) = P(Z ≤ 1.61) − P(Z ≤ −1.12) = 0.9463 − 0.1314 = 0.8149. Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 10 / 41 Normal Distribution Many tables for the normal distribution contain only the nonnegative part. Liang Zhang (UofU) Applied Statistics I June 30, 2008 10 / 41 Normal Distribution Many tables for the normal distribution contain only the nonnegative part. z .00 .01 .02 ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 1.7 0.9554 0.9564 0.9573 ··· ··· ··· ··· What is P(Z < −1.63)? Liang Zhang (UofU) .03 ··· 0.9484 0.9582 ··· Applied Statistics I .04 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· .09 ··· 0.9545 0.9633 ··· June 30, 2008 10 / 41 Normal Distribution Many tables for the normal distribution contain only the nonnegative part. z .00 .01 .02 .03 .04 ··· .09 ··· ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633 ··· ··· ··· ··· ··· ··· ··· ··· What is P(Z < −1.63)? By symmetry of the pdf of Z , we know that P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516 Liang Zhang (UofU) Applied Statistics I June 30, 2008 10 / 41 Normal Distribution Many tables for the normal distribution contain only the nonnegative part. z .00 .01 .02 .03 .04 ··· .09 ··· ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633 ··· ··· ··· ··· ··· ··· ··· ··· What is P(Z < −1.63)? By symmetry of the pdf of Z , we know that P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516 Liang Zhang (UofU) Applied Statistics I June 30, 2008 10 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 11 / 41 Normal Distribution Recall: The (100p)th percentile of the distribution of a continuous rv X , η(p), is defined by Z η(p) p = F (η(p)) = f (y )dy −∞ Liang Zhang (UofU) Applied Statistics I June 30, 2008 11 / 41 Normal Distribution Recall: The (100p)th percentile of the distribution of a continuous rv X , η(p), is defined by Z η(p) p = F (η(p)) = f (y )dy −∞ Similarly, the (100p)th percentile of the standard normal rv Z is defined by Z η(p) p = F (η(p)) = −∞ Liang Zhang (UofU) 1 2 √ e −y /2 dy 2π Applied Statistics I June 30, 2008 11 / 41 Normal Distribution Recall: The (100p)th percentile of the distribution of a continuous rv X , η(p), is defined by Z η(p) p = F (η(p)) = f (y )dy −∞ Similarly, the (100p)th percentile of the standard normal rv Z is defined by Z η(p) p = F (η(p)) = −∞ 1 2 √ e −y /2 dy 2π We need to use the table for normal distribution to find (100p)th percentile. Liang Zhang (UofU) Applied Statistics I June 30, 2008 11 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 12 / 41 Normal Distribution e.g. Find the 95th percentile for the standard normal rv Z Liang Zhang (UofU) Applied Statistics I June 30, 2008 12 / 41 Normal Distribution e.g. Find the 95th percentile for the standard normal rv Z z ··· 1.6 1.7 ··· .00 ··· 0.9452 0.9554 ··· .01 ··· 0.9463 0.9564 ··· Liang Zhang (UofU) .02 ··· 0.9474 0.9573 ··· .03 ··· 0.9484 0.9582 ··· Applied Statistics I .04 ··· 0.9495 0.9591 ··· 0.5 ··· 0.9505 0.9599 ··· ··· ··· ··· ··· ··· June 30, 2008 .09 ··· 0.9545 0.9633 ··· 12 / 41 Normal Distribution e.g. Find the 95th percentile for the standard normal rv Z z .00 .01 .02 .03 .04 0.5 ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 ··· ··· ··· ··· ··· ··· ··· η(95) = 1.645, a linear interpolation of 1.64 and 1.65. Liang Zhang (UofU) Applied Statistics I ··· ··· ··· ··· ··· June 30, 2008 .09 ··· 0.9545 0.9633 ··· 12 / 41 Normal Distribution e.g. Find the 95th percentile for the standard normal rv Z z .00 .01 .02 .03 .04 0.5 ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 ··· ··· ··· ··· ··· ··· ··· η(95) = 1.645, a linear interpolation of 1.64 and 1.65. ··· ··· ··· ··· ··· .09 ··· 0.9545 0.9633 ··· Remark: If p does not appear in the table, we can either use the number closest to it, or use the linear interpolation of the closest two. Liang Zhang (UofU) Applied Statistics I June 30, 2008 12 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 13 / 41 Normal Distribution In statistical inference, the percentiles corresponding to right small tails are heavily used. Notation zα will denote the value on the z axis for which α of the area under the z curve lies to the right of zα . Liang Zhang (UofU) Applied Statistics I June 30, 2008 13 / 41 Normal Distribution In statistical inference, the percentiles corresponding to right small tails are heavily used. Notation zα will denote the value on the z axis for which α of the area under the z curve lies to the right of zα . zα Liang Zhang (UofU) Applied Statistics I June 30, 2008 13 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 14 / 41 Normal Distribution Remark: 1. zα is the 100(1 − α)th percentile of the standard normal distribution. Liang Zhang (UofU) Applied Statistics I June 30, 2008 14 / 41 Normal Distribution Remark: 1. zα is the 100(1 − α)th percentile of the standard normal distribution. 2. By symmetry the area under the standard normal curve to the left of −zα is also α. Liang Zhang (UofU) Applied Statistics I June 30, 2008 14 / 41 Normal Distribution Remark: 1. zα is the 100(1 − α)th percentile of the standard normal distribution. 2. By symmetry the area under the standard normal curve to the left of −zα is also α. 3. The zα s are usually referred to as z critical values. Percentile α (tail area) zα 90 0.1 1.28 Liang Zhang (UofU) 95 0.05 1.645 97.5 0.025 1.96 ··· ··· ··· Applied Statistics I 99.95 0.0005 3.27 June 30, 2008 14 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 15 / 41 Normal Distribution Proposition If X has a normal distribution with mean µ and stadard deviation σ, then Z= X −µ σ has a standard normal distribution. Thus a−µ b−µ ≤Z ≤ ) σ σ b−µ a−µ = Φ( ) − Φ( ) σ σ P(a ≤ X ≤ b) = P( P(X ≤ a) = Φ( Liang Zhang (UofU) a−µ ) σ P(X ≥ b) = 1 − Φ( Applied Statistics I b−µ ) σ June 30, 2008 15 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 16 / 41 Normal Distribution Example (Problem 38): There are two machines available for cutting corks intended for use in wine bottles. The first produces corks with diameters that are normally distributed with mean 3cm and standard deviation 0.1cm. The second produces corks with diameters that have a normal distribution with mean 3.04cm and standard deviation 0.02cm. Acceptable corks have diameters between 2.9cm and 3.1cm. Which machine is more likely to produce an acceptable cork? Liang Zhang (UofU) Applied Statistics I June 30, 2008 16 / 41 Normal Distribution Example (Problem 38): There are two machines available for cutting corks intended for use in wine bottles. The first produces corks with diameters that are normally distributed with mean 3cm and standard deviation 0.1cm. The second produces corks with diameters that have a normal distribution with mean 3.04cm and standard deviation 0.02cm. Acceptable corks have diameters between 2.9cm and 3.1cm. Which machine is more likely to produce an acceptable cork? 2.9 − 3 3.1 − 3 ≤Z ≤ ) 0.1 0.1 = P(−1 ≤ Z ≤ 1) = 0.8413 − 0.1587 = 0.6826 2.9 − 3.04 3.1 − 3.04 P(2.9 ≤ X2 ≤ 3.1) = P( ≤Z ≤ ) 0.02 0.02 = P(−7 ≤ Z ≤ 3) = 0.9987 − 0 = 0.9987 P(2.9 ≤ X1 ≤ 3.1) = P( Liang Zhang (UofU) Applied Statistics I June 30, 2008 16 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 17 / 41 Normal Distribution Example (Problem 44): If bolt thread length is normally distributed, what is the probability that the thread length of a randomly selected bolt is (a)within 1.5 SDs of its mean value? (b)between 1 and 2 SDs from its mean value? Liang Zhang (UofU) Applied Statistics I June 30, 2008 17 / 41 Normal Distribution Example (Problem 44): If bolt thread length is normally distributed, what is the probability that the thread length of a randomly selected bolt is (a)within 1.5 SDs of its mean value? (b)between 1 and 2 SDs from its mean value? µ + 1.5σ − µ µ − 1.5σ − µ ≤Z ≤ ) σ σ = P(−1.5 ≤ Z ≤ 1.5) P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P( = 0.9332 − 0.0668 = 0.8664 Liang Zhang (UofU) Applied Statistics I June 30, 2008 17 / 41 Normal Distribution Example (Problem 44): If bolt thread length is normally distributed, what is the probability that the thread length of a randomly selected bolt is (a)within 1.5 SDs of its mean value? (b)between 1 and 2 SDs from its mean value? µ + 1.5σ − µ µ − 1.5σ − µ ≤Z ≤ ) σ σ = P(−1.5 ≤ Z ≤ 1.5) P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P( = 0.9332 − 0.0668 = 0.8664 µ+σ−µ µ + 2σ − µ ≤Z ≤ ) σ σ = 2P(1 ≤ Z ≤ 2) 2 · P(µ + σ ≤ X1 ≤ µ + 2σ) = 2P( = 2(0.9772 − 0.8413) = 0.0.2718 Liang Zhang (UofU) Applied Statistics I June 30, 2008 17 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 18 / 41 Normal Distribution Proposition {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ Liang Zhang (UofU) Applied Statistics I June 30, 2008 18 / 41 Normal Distribution Proposition {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ Example (Problem 39) The width of a line etched on an integrated circuit chip is normally distributed with mean 3.000 µm and standard deviation 0.140. What width value separates the widest 10% of all such lines from the other 90%? Liang Zhang (UofU) Applied Statistics I June 30, 2008 18 / 41 Normal Distribution Proposition {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ Example (Problem 39) The width of a line etched on an integrated circuit chip is normally distributed with mean 3.000 µm and standard deviation 0.140. What width value separates the widest 10% of all such lines from the other 90%? ηN(3,0.1402 ) (90) = 3.0 + 0.140 · ηN(0,1) (90) = 3.0 + 0.140 · 1.28 = 3.1792 Liang Zhang (UofU) Applied Statistics I June 30, 2008 18 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 19 / 41 Normal Distribution Proposition Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has √ approximately a normal distribution with µ = np and σ = npq, where q = 1 − p. In particular, for x = a posible value of X , area under the normal curve P(X ≤ x) = B(x; n, p) ≈ to the left of x+0.5 x+0.5 − np = Φ( √ ) npq In practice, the approximation is adequate provided that both np ≥ 10 and nq ≥ 10, since there is then enough symmetry in the underlying binomial distribution. Liang Zhang (UofU) Applied Statistics I June 30, 2008 19 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 20 / 41 Normal Distribution A graphical explanation for P(X ≤ x) = B(x; n, p) ≈ = Φ( Liang Zhang (UofU) area under the normal curve to the left of x+0.5 x+0.5 − np ) √ npq Applied Statistics I June 30, 2008 20 / 41 Normal Distribution A graphical explanation for P(X ≤ x) = B(x; n, p) ≈ = Φ( Liang Zhang (UofU) area under the normal curve to the left of x+0.5 x+0.5 − np ) √ npq Applied Statistics I June 30, 2008 20 / 41 Normal Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 21 / 41 Normal Distribution Example (Problem 54) Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts, and let X denote the number among these that are nonconforming and can be reworked. What is the (approximate) probability that X is between 15 and 25 (inclusive)? Liang Zhang (UofU) Applied Statistics I June 30, 2008 21 / 41 Normal Distribution Example (Problem 54) Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts, and let X denote the number among these that are nonconforming and can be reworked. What is the (approximate) probability that X is between 15 and 25 (inclusive)? In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus np = 20 > 10 and nq = 180 > 10 Liang Zhang (UofU) Applied Statistics I June 30, 2008 21 / 41 Normal Distribution Example (Problem 54) Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts, and let X denote the number among these that are nonconforming and can be reworked. What is the (approximate) probability that X is between 15 and 25 (inclusive)? In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus np = 20 > 10 and nq = 180 > 10 P(15 ≤ X ≤ 25) = Bin(25; 200, 0.1) − Bin(14; 200, 0.1) 15 + 0.5 − 20 25 + 0.5 − 20 ) − Φ( √ ) ≈ Φ( √ 200 · 0.1 · 0.9 200 · 0.1 · 0.9 = Φ(0.3056) − Φ(−0.2500) = 0.6217 − 0.4013 = 0.2204 Liang Zhang (UofU) Applied Statistics I June 30, 2008 21 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 22 / 41 Exponential Distribution Definition X is said to have an exponential distribution with parameter λ(λ > 0) if the pdf of X is ( λe −λx x ≥ 0 f (x; λ) = 0 otherwise Liang Zhang (UofU) Applied Statistics I June 30, 2008 22 / 41 Exponential Distribution Definition X is said to have an exponential distribution with parameter λ(λ > 0) if the pdf of X is ( λe −λx x ≥ 0 f (x; λ) = 0 otherwise Remark: 1. Usually we use X ∼ EXP(λ) to denote that the random variable X has an exponential distribution with parameter λ. Liang Zhang (UofU) Applied Statistics I June 30, 2008 22 / 41 Exponential Distribution Definition X is said to have an exponential distribution with parameter λ(λ > 0) if the pdf of X is ( λe −λx x ≥ 0 f (x; λ) = 0 otherwise Remark: 1. Usually we use X ∼ EXP(λ) to denote that the random variable X has an exponential distribution with parameter λ. 2. In some sources, the pdf of exponential distribution is given by ( 1 − θx e x ≥0 f (x; θ) = θ 0 otherwise The difference is that λ → 1θ . Liang Zhang (UofU) Applied Statistics I June 30, 2008 22 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 23 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 23 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 24 / 41 Exponential Distribution Proposition If X ∼ EXP(λ), then E (X ) = 1 λ and V (X ) = 1 λ2 And the cdf for X is ( 1 − e −λx F (x; λ) = 0 Liang Zhang (UofU) Applied Statistics I x ≥0 x <0 June 30, 2008 24 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 25 / 41 Exponential Distribution Proof: Z E (X ) = = = = = = ∞ xλe −λx dx 0 Z 1 ∞ (λx)e −λx d(λx) λ 0 Z 1 ∞ −y ye dy y = λx λ 0 Z ∞ 1 [−ye −y |∞ e −y dy ] integration by parts:u = y , v = −e −y 0 + λ 0 1 −y ∞ [0 + (−e |0 )] λ 1 λ Liang Zhang (UofU) Applied Statistics I June 30, 2008 25 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 26 / 41 Exponential Distribution Proof (continued): Z ∞ 2 E (X ) = x 2 λe −λx dx 0 Z ∞ 1 = 2 (λx)2 e −λx d(λx) λ 0 Z ∞ 1 = 2 y 2 e −y dy λ 0 Z ∞ 1 = 2 [−y 2 e −y |∞ + 2ye −y dy ] 0 λ 0 Z ∞ 1 −y ∞ = 2 [0 + 2(−ye |0 + e −y dy )] λ 0 1 = 2 2[0 + (−ye −y |∞ 0 )] λ 2 = 2 λ Liang Zhang (UofU) Applied Statistics I y = λx integration by parts integration by parts June 30, 2008 26 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 27 / 41 Exponential Distribution Proof (continued): 2 1 1 V (X ) = E (X 2 ) − [E (X )]2 = 2 − ( )2 = 2 λ λ λ Z x −λy F (x) = λe dy 0 Z x = e −λy d(λy ) 0 Z x = e −z dz z = λy 0 = −e −z |x0 = 1 − e −x Liang Zhang (UofU) Applied Statistics I June 30, 2008 27 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 28 / 41 Exponential Distribution Example (Problem 108) The article “Determination of the MTF of Positive Photoresists Using the Monte Carlo method” (Photographic Sci. and Engr., 1983: 254-260) proposes the exponential distribution with parameter λ = 0.93 as a model for the distribution of a photon’s free path length (µm) under certain circumstances. Suppose this is the correct model. Liang Zhang (UofU) Applied Statistics I June 30, 2008 28 / 41 Exponential Distribution Example (Problem 108) The article “Determination of the MTF of Positive Photoresists Using the Monte Carlo method” (Photographic Sci. and Engr., 1983: 254-260) proposes the exponential distribution with parameter λ = 0.93 as a model for the distribution of a photon’s free path length (µm) under certain circumstances. Suppose this is the correct model. a. What is the expected path length, and what is the standard deviation of path length? Liang Zhang (UofU) Applied Statistics I June 30, 2008 28 / 41 Exponential Distribution Example (Problem 108) The article “Determination of the MTF of Positive Photoresists Using the Monte Carlo method” (Photographic Sci. and Engr., 1983: 254-260) proposes the exponential distribution with parameter λ = 0.93 as a model for the distribution of a photon’s free path length (µm) under certain circumstances. Suppose this is the correct model. a. What is the expected path length, and what is the standard deviation of path length? b. What is the probability that path length exceeds 3.0? Liang Zhang (UofU) Applied Statistics I June 30, 2008 28 / 41 Exponential Distribution Example (Problem 108) The article “Determination of the MTF of Positive Photoresists Using the Monte Carlo method” (Photographic Sci. and Engr., 1983: 254-260) proposes the exponential distribution with parameter λ = 0.93 as a model for the distribution of a photon’s free path length (µm) under certain circumstances. Suppose this is the correct model. a. What is the expected path length, and what is the standard deviation of path length? b. What is the probability that path length exceeds 3.0? c. What value is exceeded by only 10% of all path lengths? Liang Zhang (UofU) Applied Statistics I June 30, 2008 28 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 29 / 41 Exponential Distribution Proposition Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter αt (where α, the rate of the event process, is the expected number of events occurring in 1 unit of time) and that numbers of occurrences in nonoverlappong intervals are independent of one another. Then the distribution of elapsed time between the occurrence of two successive events is exponential with parameter λ = α. Liang Zhang (UofU) Applied Statistics I June 30, 2008 29 / 41 Exponential Distribution Proposition Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter αt (where α, the rate of the event process, is the expected number of events occurring in 1 unit of time) and that numbers of occurrences in nonoverlappong intervals are independent of one another. Then the distribution of elapsed time between the occurrence of two successive events is exponential with parameter λ = α. e.g. the number of customers visiting Costco in each hour =⇒ Poisson distribution; Liang Zhang (UofU) Applied Statistics I June 30, 2008 29 / 41 Exponential Distribution Proposition Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter αt (where α, the rate of the event process, is the expected number of events occurring in 1 unit of time) and that numbers of occurrences in nonoverlappong intervals are independent of one another. Then the distribution of elapsed time between the occurrence of two successive events is exponential with parameter λ = α. e.g. the number of customers visiting Costco in each hour =⇒ Poisson distribution; the time between every two successive customers visiting Costco =⇒ Exponential distribution. Liang Zhang (UofU) Applied Statistics I June 30, 2008 29 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 30 / 41 Exponential Distribution Example (Example 4.22) Suppose that calls are received at a 24-hour hotline according to a Poisson process with rate α = 0.5 call per day. Liang Zhang (UofU) Applied Statistics I June 30, 2008 30 / 41 Exponential Distribution Example (Example 4.22) Suppose that calls are received at a 24-hour hotline according to a Poisson process with rate α = 0.5 call per day. Then the number of days X between successive calls has an exponential distribution with parameter value 0.5. Liang Zhang (UofU) Applied Statistics I June 30, 2008 30 / 41 Exponential Distribution Example (Example 4.22) Suppose that calls are received at a 24-hour hotline according to a Poisson process with rate α = 0.5 call per day. Then the number of days X between successive calls has an exponential distribution with parameter value 0.5. The probability that more than 3 days elapse between calls is P(X > 3) = 1 − P(X ≤ 3) = 1 − F (3; 0.5) = e −0.5·3 = 0.223. Liang Zhang (UofU) Applied Statistics I June 30, 2008 30 / 41 Exponential Distribution Example (Example 4.22) Suppose that calls are received at a 24-hour hotline according to a Poisson process with rate α = 0.5 call per day. Then the number of days X between successive calls has an exponential distribution with parameter value 0.5. The probability that more than 3 days elapse between calls is P(X > 3) = 1 − P(X ≤ 3) = 1 − F (3; 0.5) = e −0.5·3 = 0.223. The expected time between successive calls is 1/0.5 = 2 days. Liang Zhang (UofU) Applied Statistics I June 30, 2008 30 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 31 / 41 Exponential Distribution “Memoryless” Property Let X = the time certain component lasts (in hours) and we assume the component lifetime is exponentially distributed with parameter λ. Then what is the probability that the component can last at least an additional t hours after working for t0 hours, i.e. what is P(X ≥ t + t0 | X ≥ t0 )? Liang Zhang (UofU) Applied Statistics I June 30, 2008 31 / 41 Exponential Distribution “Memoryless” Property Let X = the time certain component lasts (in hours) and we assume the component lifetime is exponentially distributed with parameter λ. Then what is the probability that the component can last at least an additional t hours after working for t0 hours, i.e. what is P(X ≥ t + t0 | X ≥ t0 )? P({X ≥ t + t0 } ∩ {X ≥ t0 }) P(X ≥ t0 ) P(X ≥ t + t0 ) = P(X ≥ t0 ) 1 − F (t + t0 ; λ) = F (t0 ; λ) P(X ≥ t + t0 | X ≥ t0 ) = = e −λt Liang Zhang (UofU) Applied Statistics I June 30, 2008 31 / 41 Exponential Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 32 / 41 Exponential Distribution “Memoryless” Property However, we have P(X ≥ t) = 1 − F (t; λ) = e −λt Liang Zhang (UofU) Applied Statistics I June 30, 2008 32 / 41 Exponential Distribution “Memoryless” Property However, we have P(X ≥ t) = 1 − F (t; λ) = e −λt Therefore, we have P(X ≥ t) = P(X ≥ t + t0 | X ≥ t0 ) for any positive t and t0 . Liang Zhang (UofU) Applied Statistics I June 30, 2008 32 / 41 Exponential Distribution “Memoryless” Property However, we have P(X ≥ t) = 1 − F (t; λ) = e −λt Therefore, we have P(X ≥ t) = P(X ≥ t + t0 | X ≥ t0 ) for any positive t and t0 . In words, the distribution of additional lifetime is exactly the same as the original distribution of lifetime, so at each point in time the component shows no effect of wear. In other words, the distribution of remaining lifetime is independent of current age. Liang Zhang (UofU) Applied Statistics I June 30, 2008 32 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Definition For α > 0, the gamma function Γ(α) is defined by Z ∞ x α−1 e −x dx Γ(α) = 0 Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Definition For α > 0, the gamma function Γ(α) is defined by Z ∞ x α−1 e −x dx Γ(α) = 0 Properties for gamma function: 1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts]; Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Definition For α > 0, the gamma function Γ(α) is defined by Z ∞ x α−1 e −x dx Γ(α) = 0 Properties for gamma function: 1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts]; 2. For any positive integer, n, Γ(n) = (n − 1)!; Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Definition For α > 0, the gamma function Γ(α) is defined by Z ∞ x α−1 e −x dx Γ(α) = 0 Properties for gamma function: 1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts]; 2. For any positive integer, n, Γ(n) = (n − 1)!; √ 3. Γ( 12 ) = π. Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Definition For α > 0, the gamma function Γ(α) is defined by Z ∞ x α−1 e −x dx Γ(α) = 0 Properties for gamma function: 1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1) [via integration by parts]; 2. For any positive integer, n, Γ(n) = (n − 1)!; √ 3. Γ( 12 ) = π. √ e.g. Γ(4) = (4 − 1)! = 6 and Γ( 52 ) = 23 · Γ( 32 ) = 23 [ 12 · Γ( 12 )] = 34 π Liang Zhang (UofU) Applied Statistics I June 30, 2008 33 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 34 / 41 Gamma Distribution Definition A continuous random variable X is said to have a gamma distribution if the pdf of X is ( 1 x α−1 e −x/β x ≥ 0 α f (x; α, β) = β Γ(α) 0 otherwise where the parameters α and β satisfy α > 0, β > 0. The standard gamma distribution has β = 1, so the pdf of a standard gamma rv is ( 1 x α−1 e −x x ≥ 0 f (x; α) = Γ(α) 0 otherwise Liang Zhang (UofU) Applied Statistics I June 30, 2008 34 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 35 / 41 Gamma Distribution Remark: 1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma distribution with parameter α and β. Liang Zhang (UofU) Applied Statistics I June 30, 2008 35 / 41 Gamma Distribution Remark: 1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma distribution with parameter α and β. 2. If we let α = 1 and β = 1/λ, then we get the exponential distribution: 1 f (x; 1, ) = λ Liang Zhang (UofU) ( 1 1 Γ(1) λ 1 x 1−1 e −x/ λ = λe −λx 0 x ≥0 otherwise Applied Statistics I June 30, 2008 35 / 41 Gamma Distribution Remark: 1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma distribution with parameter α and β. 2. If we let α = 1 and β = 1/λ, then we get the exponential distribution: 1 f (x; 1, ) = λ ( 1 1 Γ(1) λ 1 x 1−1 e −x/ λ = λe −λx 0 x ≥0 otherwise 3. When X is a standard gamma rv (β = 1), the cdf of X , Z F (x; α) = 0 x y α−1 e −y dy Γ(α) is called the incomplete gamma function. Liang Zhang (UofU) Applied Statistics I June 30, 2008 35 / 41 Gamma Distribution Remark: 1. We use X ∼ GAM(α, β) to denote that the rv X has a gamma distribution with parameter α and β. 2. If we let α = 1 and β = 1/λ, then we get the exponential distribution: 1 f (x; 1, ) = λ ( 1 1 Γ(1) λ 1 x 1−1 e −x/ λ = λe −λx 0 x ≥0 otherwise 3. When X is a standard gamma rv (β = 1), the cdf of X , Z F (x; α) = 0 x y α−1 e −y dy Γ(α) is called the incomplete gamma function. There are extensive tables of F (x; α) available (Appendix Table A.4). Liang Zhang (UofU) Applied Statistics I June 30, 2008 35 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 36 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 36 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 37 / 41 Gamma Distribution Proposition If X ∼ GAM(α, β), then E (X ) = αβ and V (X ) = αβ 2 Furthermore, for any x > 0, the cdf of X is given by x ;α P(X ≤ x) = F (x; α, β) = F β where F (•; α) is the incomplete gamma function. Liang Zhang (UofU) Applied Statistics I June 30, 2008 37 / 41 Gamma Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 38 / 41 Gamma Distribution Example: The survival time (in days) of a white rat that was subjected to a certain level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is a. the probability that the survival time is at most 16 days; Liang Zhang (UofU) Applied Statistics I June 30, 2008 38 / 41 Gamma Distribution Example: The survival time (in days) of a white rat that was subjected to a certain level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is a. the probability that the survival time is at most 16 days; b. the probability that the survival time is between 16 days and 20 days (not inclusive); Liang Zhang (UofU) Applied Statistics I June 30, 2008 38 / 41 Gamma Distribution Example: The survival time (in days) of a white rat that was subjected to a certain level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is a. the probability that the survival time is at most 16 days; b. the probability that the survival time is between 16 days and 20 days (not inclusive); c. the expected survival time. Liang Zhang (UofU) Applied Statistics I June 30, 2008 38 / 41 Gamma Distribution Example: The survival time (in days) of a white rat that was subjected to a certain level of X-ray radiation is a random variable X ∼ GAM(5, 4). Then what is a. the probability that the survival time is at most 16 days; b. the probability that the survival time is between 16 days and 20 days (not inclusive); c. the expected survival time. Liang Zhang (UofU) Applied Statistics I June 30, 2008 38 / 41 Chi-Squared Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 39 / 41 Chi-Squared Distribution Definition Let ν be a positive integer. Then a random variable X is said to have a chi-squared distribution with parameter ν if the pdf of X is the gamma density with α = ν/2 and β = 2. The pdf of a chi-squared rv is thus ( 1 x (ν/2)−1 e −x/2 x ≥ 0 ν/2 f (x; ν) = 2 Γ(ν/2) 0 x <0 The parameter ν is called the number of degrees of freedom (df) of X . The symbol χ2 is often used in place of “chi-squared”. Liang Zhang (UofU) Applied Statistics I June 30, 2008 39 / 41 Chi-Squared Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 40 / 41 Chi-Squared Distribution Remark: 1. Usually, we use X ∼ χ2 (ν) to denote that X is a chi-squared rv with parameter ν; Liang Zhang (UofU) Applied Statistics I June 30, 2008 40 / 41 Chi-Squared Distribution Remark: 1. Usually, we use X ∼ χ2 (ν) to denote that X is a chi-squared rv with parameter ν; 2. If X1 , X2 , . . . , Xn is n independent standard normal rv’s, then X12 + X22 + · · · + Xn2 has the same distribution as χ2 (n). Liang Zhang (UofU) Applied Statistics I June 30, 2008 40 / 41 Chi-Squared Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 41 / 41 Chi-Squared Distribution Liang Zhang (UofU) Applied Statistics I June 30, 2008 41 / 41