Normal Distribution Definition A continuous rv X is said to have a normal distribution with parameter µ and σ (µ and σ 2 ), where −∞ < µ < ∞ and σ > 0, if the pdf of X is f (x; µ, σ) = √ 1 2 2 e −(x−µ) /(2σ ) 2πσ We use the notation X ∼ N(µ, σ 2 ) to denote that X is rormally distributed with parameters µ and σ 2 . Remark: 1. Obviously, f (x) ≥ 0 for R ∞all x;1 −(y −µ)2 /(2σ2 ) 2. It is guaranteed that −∞ √2πσ e dy = 1. Liang Zhang (UofU) Applied Statistics I June 30, 2008 1 / 20 Normal Distribution Proposition For X ∼ N(µ, σ 2 ), we have E (X ) = µ and V (X ) = σ 2 σ=1 Liang Zhang (UofU) σ=2 Applied Statistics I σ = 0.5 June 30, 2008 2 / 20 Liang Zhang (UofU) Applied Statistics I June 30, 2008 3 / 20 Normal Distribution The cdf of a normal random variable X is Z x F (x) = P(X ≤ x) = f (y ; µ, σ)dy −∞ Z x 1 2 2 √ = e −(y −µ) /(2σ ) dy 2πσ −∞ Z x−µ 1 2 2 =√ e −(z) /(2σ ) dz change of variable:z = y − µ 2πσ −∞ Z x−µ σ z 1 2 e −(w ) /2 · σdw change of variable:w = =√ σ 2πσ −∞ Z x−µ σ 1 2 √ e −(w ) /2 dw = 2π −∞ Liang Zhang (UofU) Applied Statistics I June 30, 2008 4 / 20 Normal Distribution Definition The normal distribution with parameter values µ = 0 and σ = 1 is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by Z . The pdf of Z is 1 2 f (z; 0, 1) = √ e −z /2 2π −∞<z <∞ The graph of f (z; 0, R1) is called the standard normal (or z) curve. The cdf z of Z is P(Z ≤ z) = −∞ f (y ; 0, 1)dy , which we will denote by Φ(z). Liang Zhang (UofU) Applied Statistics I June 30, 2008 5 / 20 Normal Distribution Shaded area = Φ(0.5) Liang Zhang (UofU) Applied Statistics I June 30, 2008 6 / 20 Normal Distribution Table A.3 z ··· -1.2 -1.1 ··· 1.6 1.7 ··· Standard Normal Curve Areas .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· Liang Zhang (UofU) .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· Applied Statistics I .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· June 30, 2008 7 / 20 Normal Distribution Z ∼ N(0, 1), calculate (a)P(Z ≤ 1.61); (b)P(Z > −1.12); and (c)P(−1.12 < Z ≤ 1.61). z ··· -1.2 -1.1 ··· 1.6 1.7 ··· .00 ··· 0.1151 0.1357 ··· 0.9452 0.9554 ··· .01 ··· 0.1131 0.1335 ··· 0.9463 0.9564 ··· .02 ··· 0.1112 0.1314 ··· 0.9474 0.9573 ··· .03 ··· 0.1094 0.1292 ··· 0.9484 0.9582 ··· .04 ··· 0.1075 0.1271 ··· 0.9495 0.9591 ··· ··· ··· ··· ··· ··· ··· ··· ··· .09 ··· 0.0985 0.1170 ··· 0.9545 0.9633 ··· P(Z ≤ 1.61) = 0.9463; P(Z > −1.12) = 1 − P(Z ≤ −1.12) = 1 − 0.1314 = 0.8686; P(−1.12 < Z ≤ 1.61) = P(Z ≤ 1.61) − P(Z ≤ −1.12) = 0.9463 − 0.1314 = 0.8149. Liang Zhang (UofU) Applied Statistics I June 30, 2008 8 / 20 Normal Distribution Many tables for the normal distribution contain only the nonnegative part. z .00 .01 .02 .03 .04 ··· .09 ··· ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 · · · 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 · · · 0.9633 ··· ··· ··· ··· ··· ··· ··· ··· What is P(Z < −1.63)? By symmetry of the pdf of Z , we know that P(Z < −1.63) = P(Z > 1.63) = 1 − P(Z ≤ 1.63) = 1 − 0.9484 = 0.0516 Liang Zhang (UofU) Applied Statistics I June 30, 2008 9 / 20 Normal Distribution Recall: The (100p)th percentile of the distribution of a continuous rv X , η(p), is defined by Z η(p) p = F (η(p)) = f (y )dy −∞ Similarly, the (100p)th percentile of the standard normal rv Z is defined by Z η(p) p = F (η(p)) = −∞ 1 2 √ e −y /2 dy 2π We need to use the table for normal distribution to find (100p)th percentile. Liang Zhang (UofU) Applied Statistics I June 30, 2008 10 / 20 Normal Distribution e.g. Find the 95th percentile for the standard normal rv Z z .00 .01 .02 .03 .04 0.5 ··· ··· ··· ··· ··· ··· ··· 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 ··· ··· ··· ··· ··· ··· ··· η(95) = 1.645, a linear interpolation of 1.64 and 1.65. ··· ··· ··· ··· ··· .09 ··· 0.9545 0.9633 ··· Remark: If p does not appear in the table, we can either use the number closest to it, or use the linear interpolation of the closest two. Liang Zhang (UofU) Applied Statistics I June 30, 2008 11 / 20 Normal Distribution In statistical inference, the percentiles corresponding to right small tails are heavily used. Notation zα will denote the value on the z axis for which α of the area under the z curve lies to the right of zα . zα Liang Zhang (UofU) Applied Statistics I June 30, 2008 12 / 20 Normal Distribution Remark: 1. zα is the 100(1 − α)th percentile of the standard normal distribution. 2. By symmetry the area under the standard normal curve to the left of −zα is also α. 3. The zα s are usually referred to as z critical values. Percentile α (tail area) zα 90 0.1 1.28 Liang Zhang (UofU) 95 0.05 1.645 97.5 0.025 1.96 ··· ··· ··· Applied Statistics I 99.95 0.0005 3.27 June 30, 2008 13 / 20 Normal Distribution Proposition If X has a normal distribution with mean µ and stadard deviation σ, then Z= X −µ σ has a standard normal distribution. Thus a−µ b−µ ≤Z ≤ ) σ σ b−µ a−µ = Φ( ) − Φ( ) σ σ P(a ≤ X ≤ b) = P( P(X ≤ a) = Φ( Liang Zhang (UofU) a−µ ) σ P(X ≥ b) = 1 − Φ( Applied Statistics I b−µ ) σ June 30, 2008 14 / 20 Normal Distribution Example (Problem 38): There are two machines available for cutting corks intended for use in wine bottles. The first produces corks with diameters that are normally distributed with mean 3cm and standard deviation 0.1cm. The second produces corks with diameters that have a normal distribution with mean 3.04cm and standard deviation 0.02cm. Acceptable corks have diameters between 2.9cm and 3.1cm. Which machine is more likely to produce an acceptable cork? 3.1 − 3 2.9 − 3 ≤Z ≤ ) 0.1 0.1 = P(−1 ≤ Z ≤ 1) = 0.8413 − 0.1587 = 0.6826 2.9 − 3.04 3.1 − 3.04 P(2.9 ≤ X2 ≤ 3.1) = P( ≤Z ≤ ) 0.02 0.02 = P(−7 ≤ Z ≤ 3) = 0.9987 − 0 = 0.9987 P(2.9 ≤ X1 ≤ 3.1) = P( Liang Zhang (UofU) Applied Statistics I June 30, 2008 15 / 20 Normal Distribution Example (Problem 44): If bolt thread length is normally distributed, what is the probability that the thread length of a randomly selected bolt is (a)within 1.5 SDs of its mean value? (b)between 1 and 2 SDs from its mean value? µ + 1.5σ − µ µ − 1.5σ − µ ≤Z ≤ ) σ σ = P(−1.5 ≤ Z ≤ 1.5) P(µ − 1.5σ ≤ X1 ≤ µ + 1.5σ) = P( = 0.9332 − 0.0668 = 0.8664 µ+σ−µ µ + 2σ − µ ≤Z ≤ ) σ σ = 2P(1 ≤ Z ≤ 2) 2 · P(µ + σ ≤ X1 ≤ µ + 2σ) = 2P( = 2(0.9772 − 0.8413) = 0.0.2718 Liang Zhang (UofU) Applied Statistics I June 30, 2008 16 / 20 Normal Distribution Proposition {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ Example (Problem 39) The width of a line etched on an integrated circuit chip is normally distributed with mean 3.000 µm and standard deviation 0.140. What width value separates the widest 10% of all such lines from the other 90%? ηN(3,0.1402 ) (90) = 3.0 + 0.140 · ηN(0,1) (90) = 3.0 + 0.140 · 1.28 = 3.1792 Liang Zhang (UofU) Applied Statistics I June 30, 2008 17 / 20 Normal Distribution Proposition Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has √ approximately a normal distribution with µ = np and σ = npq, where q = 1 − p. In particular, for x = a posible value of X , area under the normal curve P(X ≤ x) = B(x; n, p) ≈ to the left of x+0.5 x+0.5 − np = Φ( √ ) npq In practice, the approximation is adequate provided that both np ≥ 10 and nq ≥ 10, since there is then enough symmetry in the underlying binomial distribution. Liang Zhang (UofU) Applied Statistics I June 30, 2008 18 / 20 Normal Distribution A graphical explanation for P(X ≤ x) = B(x; n, p) ≈ = Φ( Liang Zhang (UofU) area under the normal curve to the left of x+0.5 x+0.5 − np ) √ npq Applied Statistics I June 30, 2008 19 / 20 Normal Distribution Example (Problem 54) Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts, and let X denote the number among these that are nonconforming and can be reworked. What is the (approximate) probability that X is between 15 and 25 (inclusive)? In this problem n = 200, p = 0.1 and q = 1 − p = 0.9. Thus np = 20 > 10 and nq = 180 > 10 P(15 ≤ X ≤ 25) = Bin(25; 200, 0.1) − Bin(14; 200, 0.1) 25 + 0.5 − 20 15 + 0.5 − 20 ) − Φ( √ ) ≈ Φ( √ 200 · 0.1 · 0.9 200 · 0.1 · 0.9 = Φ(0.3056) − Φ(−0.2500) = 0.6217 − 0.4013 = 0.2204 Liang Zhang (UofU) Applied Statistics I June 30, 2008 20 / 20