MATH 2560 C F03 Elementary Statistics I LECTURE 4: Normal Distribution Calculations. 1 Outline. ⇒ standardizing observations; ⇒ the standard normal distribution; ⇒ normal distribution calculations; ⇒ normal quantile plots; 2 Standardizing Observations All normal distributions are the same if we measure in units of size σ about the mean µ as center. ⇒ changing to these units is called standardizing. How to standardize a Value? 1. Subtract the mean of the distribution ; 2. Divide then by the standard deviation. Standardizing and z-Scores If x is an observation from a distribution that has mean µ and standard deviation σ, the standardized value of x is x−µ . σ A standardized value is often called a z-scores. z= Remark. A z-score tells us how many standard deviations the original observation falls away from the mean, and in which direction. Observations larger than the mean are positive when standardized, and observations smaller than the mean are negative. Example 1.24 Heights of Young Women. µ = 64.5 and σ = 2.5. The standardized height is z= height − 64.5 . 2.5 z= 68 − 64.5 = 1.4 2.5 For example, for x = 68 or 1.4 standard deviations above the mean. And, a woman 5 feet (60 inches) tall has standardized height z= 60 − 64.5 = −1.8 2.5 or 1.8 standard deviations less than the mean height. Using the Notations X and x : 1) we shorten ”the height of a young woman is less than 68 inches” to X < 68 (capital letter X); 2) lowercase x stands for any specific value of the variable X (for example, 64.5). We note, that X may take many values, but x only one. Standardizing and Linear Transformation Standardizing is a linear transformation that transforms the data into the standard scale z-scores. So, standardizing does not change the shape of a distribution, but the mean and standard deviation change in a simple manner. Any variable obtained from a normal variable by a linear transformation remains normal. Let X has normal distribution with mean µ and standard deviation σ, then Xnew = a + bX for a positive b has the normal distribution with mean a + bµ and standard deviation bσ. In particular, the standardized values for any distribution always have mean 0 and standard deviation 1. 3 Standard Normal Distribution Standardizing a variable that has any normal distribution produces a new variable that has the standard normal distribution. The Standard Normal Distribution The standard normal distribution is the normal distribution N (0, 1) with mean 0 and standard deviation 1. If a variable X has any normal distribution N (µ, σ) with mean µ and standard deviation σ, then the standardized variable Z= has the standard normal distribution. X −µ σ Table A appears on the inside fron cover of the textbook. You can use Table A to do normal calculations. Example 1.25: How to use Table A? Problem 1. What proportion of observations on a standard normal variable Z take values less than 1.4? Solution: to find the area to the left of 1.40, locate 1.4 in the left-hand column of Table A, then locate the remaining digit 0 as .00 in the top row. The entry opposite 1.4 and under .00 is 0.9192. This is the area we seek. Figure 1.26(a) illustartes this area. Problem 2. Find the proportion of observations from the standard normal distribution that are greater than −2.15. Solution: enter Table A under z = −2.15. That is, find −2.1 in the left-hand column and 0.5 in the top row. The table entry is 0.0158. This is the area to the left of −2.15. Because the total area under the curve is 1, the area lying to the right of −2.15 is 1 − 0.0158 = 0.9842. Figure 1.26(b) shows these area. 4 Normal Distribution Calculations We can find relative frequencies for any normal distribution by stabdardizing and using Table A. Let us consider several examples. 4.1 Example 1.26: Area to the Left 1. State the Problem. Let variable X has the N (1019, 209) distribution. We want the proportion of students with X > 820. 2. Standardize. X > 820 ⇒ X−1019 > 820−1019 ⇒ Z > −0.95. 209 209 3. Use the table. From Table A, we see that the proportion of observations less than −0.95 is 0.1711. The area to the right of −0.95 is therefore 1 − 0.1711 = 0.8289. It means that about 17 percents of students score less than 820. And 83 percents score greater than 820. Figure 1.28 shows the standard normal curve with the area of interest. 4.2 Example 1.27: Area Between Different z 1. State the Problem. Let variable X has the N (1019, 209) distribution. We want the proportion of scores in the interval720 ≤ X < 820. 2. Standardize. 720 ≤ X < 820 ⇒ 720−1019 ≤ X−1019 < 820−1019 ⇒ 209 209 209 −1.43 ≤ Z < −0.95. 3. Use the table. The area between −1.43 and −0.95 is the area to the left of −0.95 minus the area to the left of −1.43. From Table A, area between −1.43 and −0.95=(area left of −0.95)-(area left of −1.43)=0.1711− 0.0764 = 0.0947. About 9.5 percents of students. Figure 1.29 shows the area under the standard normal curve. 4.3 Example 1.28: Using Table A backward: Inverse Problem 1. State the Problem. Find score x with area 0.1 to its right under the normal curve with mean µ = 505 and standard deviation σ = 110. 2. Use the table. Look in the body of Table A for the entry closest to 0.9. It is 0.8997. This is the entry corresponding to z = 1.28. So z = 1.28 is the standardized value with area 0.9 to its left. 3. Unstandardizing: transformation the solution from the z to the original x scale. Here z = 1.28. So x itself satisfies x − 505 = 1.28. 110 Solving this equation for x gives x = 505 + (1.28)(110) = 645.8. We can see that a student must score at least 646 to place in the highest 10 percents. Figure 1.30 poses the question in graphical form. ⇒ General Rule for Unstandardizing a z-score is x = µ + zσ. 5 Normal Quantile Plots How we can judge whether data are approximately normal? The most useful tool for assessing normality is another graph, the Normal Quantile Plot. The Idea of a Normal Quantile Plot 1. Arrange the observed data values from smallest to largest. 2. Do normal distribution calculations to find the z-scores at these same percentiles. 3. Plot each data point x gainst the corresponding z. If the data distribution is close to standard normal, the plotted points will lie close to the 45-degree line x = z. If the data distribution is close to any normal distribution, the plotted points will lie close to some stright line. Use of Normal Quantile Plots If the points on a normal quantile plot lie close to a stright line, the plot indicates that the data are normal. Systematic deviations from a stright line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. Figure 1.31 to 1.33 are normal quantile plots for data met earlier (see Figure 1.33). 6 Summary To standardize any observation x, subtract the mean of the distribution and then divide by the standard deviation. The resulting z − score z = x−µ σ says how many standard deviations x lies from the distribution mean. All normal distributions are the same when measurements are transformed to the standardized scale. In particular, all normal distributions satisfy the 68.7 − 95.7 − 99.7 rule. If X has the N (µ, σ) distribution, then the standardized variable Z = (X−µ) σ has the standard normal distribution N (0, 1). Relative frequencies for any normal distribution can be calculated from the standard normal table (Table A from the book), which gives relative frequencies for the events Z < z for many values of z. The adequacy of a normal model for describing a distribution of data is best assesed by a normal quantile plot, which is available in most statistical software packages. A pattern on such a plot that deviates substantially from a traight line indicates that the data are not normal.