Business Statistics: A First Course 5th Edition Chapter 6 The Normal Distribution 正态分布 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Learning Objectives In this chapter, you learn: To compute probabilities from the normal distribution计算正态分布概率 To use the normal probability plot to determine whether a set of data is approximately normally distributed利用正态概率图来判断某一数据集是 否近似服从正态分布 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-2 Continuous Probability Distributions A continuous random variable is a variable that can assume any value on a continuum (can assume an uncountable number of values)连续性随机变量指一个 变量可以在连续取值空间上任意取值 thickness of an item物品的厚度 time required to complete a task完成某一任务所需的时间 temperature of a solution溶解温度 height, in inches高度,以英寸计 These can potentially take on any value depending only on the ability to precisely and accurately measure 理论上可以在测量工具能达到的精确度范围内的任何值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-3 The Normal Distribution ‘Bell Shaped’ 钟型 Symmetrical 对称的 f(X) Mean, Median and Mode are Equal 三种集中趋势度量一直 Location is determined by the mean, μ 位置参数为μ Spread is determined by the standard deviation, σ 离散程度(尺度参数)由σ给出 The random variable has an infinite theoretical range: + to 随机变量的取值范围为实线 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. σ X μ Mean = Median = Mode Chap 6-4 The Normal Distribution Density Function The formula for the normal probability density function is 正态分布的概率密度函数为 f(X) 1 2π e 1 (X μ) 2 2 Where e = the mathematical constant approximated by 2.71828 π = the mathematical constant approximated by 3.14159 μ = the population mean总体均值 σ = the population standard deviation总体标准差 X = any value of the continuous variable随机变量的取值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-5 Many Normal Distributions By varying the parameters μ and σ, we obtain different normal distributions 改变参数μ和σ可以得到不同样子的正态分布 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-6 The Normal Distribution Shape f(X) Changing μ shifts the distribution left or right. 改变均值可以左右移动分布 Changing σ increases or decreases the spread. σ μ Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 改变标准差可以增减离散度 X Chap 6-7 The Standardized Normal Any normal distribution (with any mean and standard deviation combination) can be transformed into the standardized normal distribution (Z) 任何正态分布都可 以转化为标准正态分布Z Need to transform X units into Z units 实现计量单位 X到Z的变换 The standardized normal distribution (Z) has a mean of 0 and a standard deviation of 1 标准正态分布的均 值为0,标准差为1 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-8 Translation to the Standardized Normal Distribution Translate from X to the standardized normal (the “Z” distribution) by subtracting the mean of X and dividing by its standard deviation: 通过减去均值后除以标准差 来实现Z变换(标准化变换) Z X μ σ The Z distribution always has mean = 0 and standard deviation = 1 Z分布的均值和标准差总是为0和1 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-9 The Standardized Normal Probability Density Function The formula for the standardized normal probability density function is标准正态分布的概 率密度函数 2 1 (1/2)Z f(Z) e 2π Where e = the mathematical constant approximated by 2.71828 π = the mathematical constant approximated by 3.14159 Z = any value of the standardized normal distribution标准正态分布 随机变量的取值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-10 The Standardized Normal Distribution Also known as the “Z” distribution Z分布 Mean is 0 Standard Deviation is 1 f(Z) 1 0 Z Values above the mean have positive Z-values, 如果变化后Z 值是正的,则变量X的值大于其均值 values below the mean have negative Z-values如果变化后Z值 是负的,则变量X的值小于其均值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-11 Example If X is distributed normally with mean of 100 and standard deviation of 50, the Z value for X = 200 is 如果随机变量X服从均值为100。标准差为50的正态分 布,则X=200时的Z值为 Z X μ σ 200 100 2.0 50 This says that X = 200 is two standard deviations (2 increments of 50 units) above the mean of 100. 表明 X=200时比均值100大了2倍的标准差 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-12 Comparing X and Z units 100 0 200 2.0 X Z (μ = 100, σ = 50) (μ = 0, σ = 1) Note that the shape of the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z) 实际上从X 到Z只是改变了变量的度量尺度,但是变量分布的形状没有 改变。 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-13 Finding Normal Probabilities 计算正态概率 Probability is measured by the area under the curve概率密度函数曲线下的 面积为变量取值概率 P (a ≤ X ≤ b) f(X) = P (a < X < b) (Note that the probability of any individual value is zero)连续随机变量的单点 取值概率为0 a Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. b X Chap 6-14 Probability as Area Under the Curve The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below 概率曲线下方总面积为1,因为曲线对称,所以大于、 小于均值部分的面积各为一半,0.5 f(X) P( X μ) 0.5 P(μ X ) 0.5 0.5 0.5 μ X P( X ) 1.0 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-15 The Standardized Normal Table 标准正态分布表 The Cumulative Standardized Normal table in the textbook (Appendix table E.2) gives the probability less than a desired value of Z (i.e., from negative infinity to Z)书中附表E.2给出了标准正态分布的累计概 率表,即给定Z值下,小于它的概率 0.9772 Example: P(Z < 2.00) = 0.9772 0 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 2.00 Z Chap 6-16 The Standardized Normal Table (continued) The column gives the value of Z to the second decimal point Z The row shows the value of Z to the first decimal point 0.00 0.01 0.02 … 0.0 0.1 . . . 2.0 2.0 P(Z < 2.00) = 0.9772 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. .9772 The value within the table gives the probability from Z = up to the desired Z value Chap 6-17 General Procedure for Finding Normal Probabilities 计算正态概率的一般步骤 To find P(a < X < b) when X is distributed normally: 如果X服从正态分布,则计算X大于值a,且小于值b的 概率如下: Draw the normal curve for the problem in terms of X 画出刻画随机变量X的正态密度曲线 Translate X-values to Z-values 将X值转换为标准正态的Z值 Use the Standardized Normal Table 查标准正态分布表 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-18 Finding Normal Probabilities Let X represent the time it takes to download an image file from the internet. 变量X表示从英特网下载图片所以的 时间 Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6) 假定X服从均值为8,标准差 为5的正态分布,计算P(X < 8.6) X 8.0 8.6 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-19 Finding Normal Probabilities (continued) Z X μ σ 8.6 8.0 0.12 5.0 μ=8 σ = 10 8 8.6 P(X < 8.6) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. μ=0 σ=1 X 0 0.12 Z P(Z < 0.12) Chap 6-20 Solution: Finding P(Z < 0.12) Standardized Normal Probability Table (Portion)标准正态概率表(部分) Z .00 .01 P(X < 8.6) = P(Z < 0.12) .02 .5478 0.0 .5000 .5040 .5080 0.1 .5398 .5438 .5478 0.2 .5793 .5832 .5871 Z 0.3 .6179 .6217 .6255 0.00 0.12 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-21 Finding Normal Upper Tail Probabilities Suppose X is normal with mean 8.0 and standard deviation 5.0.假定X服从均值为8, 标准差为5的正态分布 Now Find P(X > 8.6) 计算P(X > 8.6) X 8.0 8.6 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-22 Finding Normal Upper Tail Probabilities (continued) Now Find P(X > 8.6)… P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12) = 1.0 - 0.5478 = 0.4522 0.5478 1.000 1.0 - 0.5478 = 0.4522 Z 0 0.12 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Z 0 0.12 Chap 6-23 Finding a Normal Probability Between Two Values Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(8 < X < 8.6) Calculate Z-values: Z Z X μ σ X μ σ 88 0 5 8.6 8 0.12 5 8 8.6 X 0 0.12 Z P(8 < X < 8.6) = P(0 < Z < 0.12) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-24 Solution: Finding P(0 < Z < 0.12) Standardized Normal Probability Table (Portion) Z .00 .01 .02 P(8 < X < 8.6) = P(0 < Z < 0.12) = P(Z < 0.12) – P(Z ≤ 0) = 0.5478 - .5000 = 0.0478 0.0 .5000 .5040 .5080 0.0478 0.5000 0.1 .5398 .5438 .5478 0.2 .5793 .5832 .5871 0.3 .6179 .6217 .6255 Z 0.00 0.12 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-25 Probabilities in the Lower Tail Suppose X is normal with mean 8.0 and standard deviation 5.0. Now Find P(7.4 < X < 8) X 8.0 7.4 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-26 Probabilities in the Lower Tail (continued) Now Find P(7.4 < X < 8)… P(7.4 < X < 8) = P(-0.12 < Z < 0) 0.0478 = P(Z < 0) – P(Z ≤ -0.12) = 0.5000 - 0.4522 = 0.0478 The Normal distribution is symmetric, so this probability is the same as P(0 < Z < 0.12) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 0.4522 7.4 8.0 -0.12 0 X Z Chap 6-27 Empirical Rules经验法则 What can we say about the distribution of values around the mean? For any normal distribution: 对于给定的正态分布,如何经验的确定随机变量如何围绕均值分布? μ ± 1σ encloses about 68.26% of X’s f(X) σ μ-1σ 均值加减一倍标准差之间 覆盖68.26%的X σ μ μ+1σ X 68.26% Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-28 The Empirical Rule (continued) μ ± 2σ covers about 95% of X’s μ ± 3σ covers about 99.7% of X’s 2σ 3σ 2σ μ 95.44% Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. x 3σ μ x 99.73% Chap 6-29 Given a Normal Probability Find the X Value 计算对应正态累积概率的X值 Steps to find the X value for a known probability:步骤 1. Find the Z value for the known probability 查标准正态累积概率表找出对应概率的Z值 2. Convert to X units using the formula: 用如下转换公式计算X值 X μ Zσ Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-30 Finding the X value for a Known Probability (continued) Example: Let X represent the time it takes (in seconds) to download an image file from the internet. X表示从英特网下载图片文件的时间(秒) Suppose X is normal with mean 8.0 and standard deviation 5.0 假定X服从均值为8,标准差为5的正态分布 Find X such that 20% of download times are less than X. 计算X值 使得下载所需时间少于X的概率为20% 0.2000 ? ? Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 8.0 0 X Z Chap 6-31 Find the Z value for 20% in the Lower Tail 1. Find the Z value for the known probability先查表 得20%概率对应的Z值 Standardized Normal Probability 20% area in the lower Table (Portion) tail is consistent with a Z -0.9 … .03 .04 .05 … .1762 .1736 .1711 -0.8 … .2033 .2005 .1977 -0.7 Z value of -0.84 0.2000 … .2327 .2296 .2266 ? 8.0 -0.84 0 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. X Z Chap 6-32 Finding the X value 2. Convert to X units using the formula:利用转换 公式计算X值 X μ Zσ 8 . 0 ( 0 . 84 )5 . 0 3 . 80 So 20% of the values from a distribution with mean 8.0 and standard deviation 5.0 are less than 3.80 在均值为8。标准 差为5的正态分布中,20%的值小于3.8 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-33 Evaluating Normality正态性检验 Not all continuous distributions are normal 不是所有连续分布都是正 态分布 It is important to evaluate how well the data set is approximated by a normal distribution. 判断一个数据集是否近似服从正态分布很重要 Normally distributed data should approximate the theoretical normal distribution: 将数据分布的特征与理论正态分布的性质进行比较 The normal distribution is bell shaped (symmetrical) where the mean is equal to the median. 正态分布是对称钟型且均值与中位 数相同 The empirical rule applies to the normal distribution. 应该满足正态分布的经验法则 The interquartile range of a normal distribution is 1.33 standard deviations. 四分位距应该在标准差的1.33倍左右 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-34 Evaluating Normality (continued) Comparing data characteristics to theoretical properties 将数据分布的特征与理论正态分布的性质进行比较 Construct charts or graphs 图形判断 For small- or moderate-sized data sets, construct a stem-and-leaf display or a boxplot to check for symmetry 茎叶图、盒子图判断对称性 For large data sets, does the histogram or polygon appear bell-shaped? 柱状图与折线图是否是钟型的 Compute descriptive summary measures 描述性统计量判断 Do the mean, median and mode have similar values?三种集中趋势度量 是否近似相等 Is the interquartile range approximately 1.33 σ?四分位距是否近似1.33 σ Is the range approximately 6 σ? 全距是否近似6倍σ Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-35 Evaluating Normality (continued) Comparing data characteristics to theoretical properties 将数据分布的特征与理论正态分布的性质进行比较 Observe the distribution of the data set 观察数据的分布 Do approximately 2/3 of the observations lie within mean ±1 standard deviation? 是否2/3的观测值落在均值加减一倍标准误之间 Do approximately 80% of the observations lie within mean ±1.28 standard deviations?是否80%的观测值落在均值加减1.28倍标准误之间 Do approximately 95% of the observations lie within mean ±2 standard deviations?是否95%的观测值落在均值加减2倍标准误之间 Evaluate normal probability plot 计算正态概率图 Is the normal probability plot approximately linear (i.e. a straight line) with positive slope? 看正态概率图是否近似斜率为正的直线 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-36 Constructing A Normal Probability Plot Normal probability plot正态概率图 Arrange data into ordered array 先将数据排序 Find corresponding standardized normal quantile values (Z)计 算排序后累积百分比对应的标准正态分布Z值 Plot the pairs of points with observed data values (X) on the vertical axis and the standardized normal quantile values (Z) on the horizontal axis 以X为纵轴、Z为横轴画散点图 Evaluate the plot for evidence of linearity 评价散点图的线性程度 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-37 The Normal Probability Plot Interpretation A normal probability plot for data from a normal distribution will be approximately linear: 正态数据的正态概率图应该近似一条直线 X 90 60 30 -2 -1 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 0 1 2 Z Chap 6-38 Normal Probability Plot Interpretation (continued) Left-Skewed Right-Skewed X 90 X 90 60 60 30 30 -2 -1 0 1 2 Z -2 -1 0 1 2 Z Rectangular Nonlinear plots indicate a deviation from normality X 90 60 30 -2 -1 0 1 2 Z Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-39 Evaluating Normality An Example: Mutual Funds Returns B o x plo t o f 2 0 0 6 R e tur ns The boxplot appears reasonably symmetric, with four lower outliers at -9.0, -8.0, -8.0, -6.5 and one upper outlier at 35.0. (The normal distribution is symmetric.) -10 0 10 20 30 40 R e t ur n 2 0 0 6 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-40 Evaluating Normality An Example: Mutual Funds Returns (continued) Descriptive Statistics • The mean (12.5142) is slightly less than the median (13.1). (In a normal distribution the mean and median are equal.) • The interquartile range of 9.2 is approximately 1.46 standard deviations. (In a normal distribution the interquartile range is 1.33 standard deviations.) • The range of 44 is equal to 6.99 standard deviations. (In a normal distribution the range is 6 standard deviations.) • 72.2% of the observations are within 1 standard deviation of the mean. (In a normal distribution this percentage is 68.26%. • 87% of the observations are within 1.28 standard deviations of the mean. (In a normal distribution percentage is 80%.) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-41 Evaluating Normality An Example: Mutual Funds Returns (continued) P r o ba bil ity P lo t o f R e tur n 2 0 0 6 No r m a l 99 .99 Plot is approximately a straight line except for a few outliers at the low end and the high end. 99 95 Pe r c e nt 80 50 20 5 1 0 .01 -10 0 10 20 30 40 R e t ur n 2 0 0 6 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-42 Evaluating Normality An Example: Mutual Funds Returns (continued) Conclusions The returns are slightly left-skewed The returns have more values concentrated around the mean than expected The range is larger than expected (caused by one outlier at 35.0) Normal probability plot is reasonably straight line Overall, this data set does not greatly differ from the theoretical properties of the normal distribution Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-43 Chapter Summary Presented normal distribution Found probabilities for the normal distribution Applied normal distribution to problems Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 6-44