THE NORMAL DISTRIBUTION CONTINUOUS RANDOM VARIABLE • A variable that can assume any value on a continuum (can assume an uncountable number of values) • Examples are as follows: o thickness of an item o time required to complete a task o temperature of a solution o Height NORMAL DISTRIBUTION • It is the most common continuous distribution. • Also known as the Gaussian distribution or the bell curve. • In this distribution, the probability that various values occur within certain ranges or intervals can be calculated. THE NORMAL DISTRIBUTION PROPERTIES 1. ‘Bell Shaped’ 2. Symmetrical 3. Mean, Median and Mode are equal 4. Location is characterized by the mean, μ 5. Spread is characterized by the standard deviation, σ 6. The random variable has an infinite theoretical range: -¥ to +¥ NOTE: Values above the mean have positive Z-values, values below the mean have negative Z-values EXAMPLE • If X is distributed normally with mean of 100 and standard deviation of 50, the Z value for X = 200 is 𝑧= 𝑋$ − 𝜇 200 − 100 = = 2.0 𝜎 50 • This says that X = 200 is two standard deviations (2 increments of 50 units) above the mean of 100. • Note that the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z) NORMAL PROBABILITIES • Probability is measured by the area under the curve • The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below. THE NORMAL DISTRIBUTION SHAPE EXAMPLE 1 • • • THE STANDARDIZED NORMAL DISTRIBUTION • Also known as the “Z” distribution • Mean is 0 • Standard Deviation is 1 Let X represent the time it takes (in seconds) to download an image file from the internet. Suppose X is normal with mean 8.0 and standard deviation 5.0 Find P(X < 8.6) Calculate Z-values as follows: 𝑧= 𝑧= 𝑋$ − 𝜇 8− 8 = = 0 𝜎 5 𝑋$ − 𝜇 8.6 − 8 = = 0.12 𝜎 5 ANSWER: P( X < 8.6) = P( Z < 0.12) = .5478 or 54.78% EXAMPLE 2 • Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X > 8.6) 𝑃 (𝑋 > 8.6) = 1.0 − 0.5478 CA 51018: Statistical Analysis with Software Applications EXAMPLE 3 • Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(8 < X < 8.6) • Calculate Z-values as follows: 𝑧= 𝑧= Do approximately 95% of the observations lie within mean ± 2 standard deviations? Evaluate normal probability plot o Is the normal probability plot approximately linear with positive slope? o A normal probability plot for data from a normal distribution will be approximately linear o Non-linear plots indicate a deviation from normality o ANSWER: P (X >8.6) = P (Z >0.12) = 0.4522 or 45.22% 𝑋$ − 𝜇 8− 8 = = 0 𝜎 5 𝑋$ − 𝜇 8.6 − 8 = = 0.12 𝜎 5 ANSWER: P(8 < X < 8.6) = P (0 < Z < 0.12) =.0478 or 4.78% ASSESSING NORMALITY • It is important to evaluate how well the data set is approximated by a normal distribution. • Normally distributed data should approximate the theoretical normal distribution: o The normal distribution is bell shaped (symmetrical) where the mean is equal to the median. o The empirical rule applies to the normal distribution. o The interquartile range of a normal distribution is 1.33 standard deviations. THE EMPIRICAL RULE AS APPLIED TO THE NORMAL DISTRIBUTION • This rule states that for symmetrical bell-shaped data sets, one can find that roughly two out of every three observations are contained within a distance of 1 standard deviation around the mean and roughly ASSESSING NORMALITY (cont.) • Construct charts or graphs o For small- or moderate-sized data sets, do stem-and- leaf display and box-andwhisker plot look symmetric? o For large data sets, does the histogram or polygon appear bell-shaped? • Compute descriptive summary measures o Do the mean, median and mode have similar values? o Is the interquartile range approximately 1.33 σ? o Is the range approximately 6 σ? • Observe the distribution of the data set o Do approximately 2/3 of the observations lie within mean ± 1 standard deviation? o Do approximately 80% of the observations lie within mean ± 1.28 standard deviations? CA 51018: Statistical Analysis with Software Applications EXPLORATORY DATA ANALYSIS THE FIVE NUMBER SUMMARY • The five numbers that describe the spread of data are: o Minimum o First Quartile (Q1) o Median (Q2) o Third Quartile (Q3) o Maximum • • • The Box-and-Whisker Plot is a graphical display of the five number summary. The Box and central line are centered between the endpoints if data are symmetric around the median. A Box-and-Whisker plot can be shown in either vertical or horizontal format. OTHER WAYS OF ASSESSING NORMALITY OF DATA • checking for skewness with Pearson coefficient (PC) of skewness as: 𝑧= o • 3(𝑋$ − 𝑚𝑒𝑑𝑖𝑎𝑛) 𝑠 NOTE: The data is considered significantly skewed when PC is greater than or equal to + 1 or less than or equal to – 1. checking for outliers o NOTE: An outlier is a data value that lies more than 1.5(IQR) units below Q1 or 1.5(IQR) units above Q3. REFERENCES: Berenson, M. L., Krehbiel, T. C., Levine, D. M., & Stephan, D. (2008). Statistics for Managers Using Microsoft Excel. Pearson. Bluman, G. (2018). Elementary statistics : a step by step approach. New York: McGraw-Hill Education. Statistical Analysis with Software Philippines: McGraw-Hill Education. Applications. CA 51018: Statistical Analysis with Software Applications