Chapter 5 - The Normal Curve PART II : DESCRIPTIVE STATISTICS Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 1 / 20 Histogram and the Density Curve Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 2 / 20 Density Curves A density curve may be used to display the distribution of the data in addition to or instead of a histogram. We can consider a density curve as a smooth approximation to the histogram computed from the data. A density curve describes the distribution of a quantitative continuous variable. Dr. Joseph Brennan (Math 148, BU) For continuous response variables, the histogram computed from the data (sample), approximates the (unknown) population density of the response variable. Chapter 5 - The Normal Curve 3 / 20 Properties of Density Curves Like histograms, the density curves may be described by their symmetry and if they are skewed. Density curves also have measures of center and spread. µ is the mean of a density curve. µ̃ is the median of a density curve. σ is the standard deviation of a density curve. q1 and q3 are the first and third quartiles of a density curve. NOTE 1: The mean and median are the same for a symmetric density curve. They both lie at the center of the curve. NOTE 2: The mean of a skewed curve is pulled away from the median in the direction of the long tail. NOTE 3: The standard deviation of a density curve is computed mathematically, and is difficult to estimate visually. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 4 / 20 Population Parameters and Statistics If the density curve describes the population distribution, then the mean µ and standard deviation σ of the density curve are the (unknown) population parameters. The sample average x̄ and s computed from a data set estimate µ and σ, respectively, but usually are not exact. µ = 0.25 σ = 0.144338 Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve x̄ = 0.2556 s = 0.144446 5 / 20 The Normal Curve Perhaps, the most important density curve in statistics! Figure : Figure 6. The (standard) normal density curve. The curve is defined by the equation z2 1 p(z) = √ e − 2 , 2π Dr. Joseph Brennan (Math 148, BU) where e = 2.71828... Chapter 5 - The Normal Curve (1) 6 / 20 Properties of the Normal Curve Properties of the (standard) normal curve: Symmetric about zero, Unimodal, The mean, median, and mode are equal, Bell-shaped, The mean µ = 0 and the standard deviation σ = 1, The area under the whole normal curve is 100% (or 1, if you use decimals). Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 7 / 20 The Normal Approximation of Data Many histograms for data are similar in shape to the normal curve, provided they are drawn to an appropriate scale. Normal Approximation: Transforming the horizontal scale of a histogram so that it aligns with the standard normal density curve. z-units are the resulting value a data point attains after normal approximation. (More information to come!) If the histogram follows the normal curve, the area under the histogram will be about the same as the area under the curve. The area under the histogram corresponds to the percentage of observations in the corresponding interval. The goal of normal approximation is to use the normal density curve approximating percentages of observations in a given interval. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 8 / 20 The Empirical Rule The (standard) normal curve is plotted against z, the standard units. The following property of the normal curve explains the origins of the Empirical Rule. THE 68-95-99.7 RULE for the NORMAL CURVE Approximately 68% of observations fall within 1 standard unit of 0 (−1 < z < 1). Approximately 95% of observations fall within 2 standard units of 0 (−2 < z < 2). Approximately 99.7% of observations fall within 3 standard unit of 0 (−3 < z < 3). The Empirical Rule, which is applicable to bell-shaped normal-like histograms, is the direct consequence of the above property of the normal curve. The range −1 < z < 1 in standard units correspond to x̄ − s < x < x̄ + s in the original, nonstandard units. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 9 / 20 The 68-95-99.7 Rule Figure : Normal curve and percentage of observations under it. Horizontal scale uses the standard units z. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 10 / 20 z-Scores z-Score: The transformation of data into standard units, normal approximation: observation − mean z= standard deviation Thus, any data point x may be recomputed in standard units as x − x̄ zx = . s We call the z which corresponds to x the z-score zx . Note that zx < 0 if x < x̄; zx = 0 if x = x̄; zx > 0 if x > x̄. (2) We may reverse the transformation; if zx is known, x can be found by x = x̄ + s · zx . Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve (3) 11 / 20 z-Scores zx = x − x̄ s The z - score indicates the number of standard deviations away a data point falls above or below the average x̄. If the histogram plotted against the z - scores follows the normal curve well, we say that the normal distribution provides a good approximation for the distribution of the data. The normal curve is well studied and many of it’s values have been stored in normal tables. Data that is found to have a good normal approximation can be correlated with the normal curve. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 12 / 20 Normal Table A normal table found in the text providing the area between −z and z: Figure : Figure 9. Fragment of a normal table. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 13 / 20 Exercise 1, Set B, p.84 Using a normal table, let us find the area under the normal curve: (d) between 0.4 and 1.3 (a) to the right of 1.25 Table Value: 0.8944 0.1056 = 1 − 0.8944 (b) to the left of -0.4 Table Value: 0.34464 (c) to the left of 0.8 Table Value: 0.7881 Table Value of 0.4: 0.6554 Table Value of 1.3: 0.9032 0.2478 = 0.9032 − 0.6554 (e) between -0.3 and 0.9 Table Value of -0.3: 0.3821 Table Value of 0.9: 0.8159 0.4338 = 0.8159 − 0.3821 (f) outside -1.5 to 1.5 Table Value of -1.5: 0.0668 Table Value of 1.5: 0.9332 0.1336 = (1 − 0.9332) + 0.0668 Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 14 / 20 Example 8, p.85 The heights of the men age 18 and over in HANES5 averaged 69 inches; the SD was 3 inches. Use the normal curve to estimate the percentage of these men with heights between 63 inches and 72 inches. Solution: The exact percentage is equal to the area under the height histogram between 63 inches and 72 inches. We assume that the histogram can be well approximated by the normal curve. We will estimate the percentage of men between 63 and 72 inches by finding the area of the corresponding region under the standard normal curve. Step 1: Draw a number line and shade the interval of interest. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 15 / 20 Example 8, p.85 Step 2: Mark the mean on the line and convert to standard units. The z - score for the left endpoint is z63 = 63 − 69 x − x̄ = = −2. s 3 The z - score for the right endpoint is z72 = x − x̄ 72 − 69 = = 1. s 3 Step 3: Sketch the normal curve and find the area under the curve above the shaded interval by using normal tables. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 16 / 20 Example 8, p.85 Conclusion: From our table of z-scores, z63 = −2 is the 2.28 percentile and z72 = 1 is the 84.13 percentile. Therefore, about 82% of the heights were between 63 inches and 72 inches. This is only an approximation, though, in truth, 81% of the men were in that range. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 17 / 20 Example (S.A.T.) The SAT is a test for readiness of students for college. The average SAT score (on a 1600 point scale) is 1025 points and the standard deviation is 200 points. How well must Jessica do on the SAT in order to place in the top 10% of all students? Solution: The problem does not say that the histogram of the SAT scores is bell-shaped, but it is reasonable to assume so. We will use the normal approximation to the distribution of the SAT scores to solve the problem. First, find a z-score representing the 90th percentile. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 18 / 20 Example (S.A.T.) Using the normal table provided in the textbook, Jessica is hoping for a score that translates to z ≈ 1.3. We know x̄ = 1025 and s = 200. z= x − x̄ s ⇒ x = x̄ + s · z= 1025 + 200 · 1.3 = 1285 So Jessica should score 1285 points to expect to be among the top 10% of students. The freshman average SAT score at Binghamton was 1305 in 2011, in what percentile is the average freshman? 1305 − 1025 = 1.4 200 Using our z-table we find a value of 0.9192. Therefore the average freshman at BU is 92 percentile. z1305 = Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 19 / 20 IQ Score An intelligence quotient, or IQ, is a score derived from one of several standardized tests designed to assess intelligence. The mean score is normalized as 100 and the standard deviation is roughly 15. An IQ score of 70 is what percentile? z70 = 70−100 15 = −2 Table Value of −2: 0.0228 or 2.2% An IQ of 150 is required for entrance into a gifted program, what percentage of students are considered eligible? z150 = 150−100 15 = 3.33 Table Value of 3.33: 0.9996 With a requirement of a score of 150, only 0.04% of students will be considered ”gifted”. Dr. Joseph Brennan (Math 148, BU) Chapter 5 - The Normal Curve 20 / 20