Chapter 3: Descriptive Statistics 1 Chapter 3 Descriptive Statistics LEARNING OBJECTIVES The focus of Chapter 3 is on the use of statistical techniques to describe data, thereby enabling you to: 1. Distinguish between measures of central tendency, measures of variability, and measures of shape. 2. Understand conceptually the meanings of mean, median, mode, quartile, percentile, and range. 3. Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation on ungrouped data. 4. Differentiate between sample and population variance and standard deviation. 5. Understand the meaning of standard deviation as it is applied using the empirical rule and Chebyshev’s theorem. 6. Compute the mean, median, standard deviation, and variance on grouped data. 7. 8. Understand box and whisker plots, skewness, and kurtosis. Compute a coefficient of correlation and interpret it. CHAPTER TEACHING STRATEGY In chapter 2, the student learned how to summarize data by constructing frequency distributions (grouping data) and by using graphical depictions. Much of the time, statisticians need to describe data by using single numerical measures. Chapter 3 presents a cadre of statistical measures for describing numerically sets of data. It can be emphasized in this chapter that there are at least two major dimensions along which data can be described. One is the measure of central tendency with which statisticians attempt to describe the more central portions of the data. Included here are the mean, median, mode, percentiles, and quartiles. It is important to establish that the Chapter 3: Descriptive Statistics 2 median is a useful device for reporting some business data such as income and housing costs because it tends to ignore the extremes. On the other hand, the mean utilizes every number of a data set in its computation. This makes the mean an attractive tool in statistical analysis. A second major category of descriptive statistical techniques are the measures of variability. Students can understand that a measure of central tendency is often not enough to fully describe data. The measure of variability helps the researcher get a handle on the spread of the data. An attempt is made in this text to communicate to the student that through the use of the empirical rule and through Chebyshev’s Theorem, students can better understand what a standard deviation means. The empirical rule will be referred to quite often throughout the course; and therefore, it is important to emphasize it as a rule of thumb. For example, in discussing control charts in chapter 17, the upper and lower control limits are established by using the range of + 3 standard deviations of the statistic as limits within which virtually all points should fall if the process is in control. z scores are presented mainly to help bridge the gap between the discussion of means and standard deviations in chapter 3 and the normal curve of chapter 6. It should be emphasized that the calculation of measures of central tendency and variability for grouped data is different than for ungrouped or raw data. While the principles are the same for the two types of data, the implementation of the formulas are different. Grouped data are represented by the class midpoints rather than computation based on raw values. For this reason, the student should be cautioned that group statistics are often just approximations. Measures of shape are useful in helping the researcher describe a distribution of data. The Pearsonian coefficient of skewness is a handy tool for ascertaining the degree of skewness in the distribution. Box and Whisker plots can be used to determine the presence of skewness in a distribution and to locate outliers. The coefficient of correlation is introduced here instead of chapter 13 (regression chapter) so that the student can begin to think about two variable relationships and analyses. Since the coefficient of correlation is a stand-alone statistic, it is appropriate to present it here. In addition, when the student studies simple regression in chapter 13, there will be a foundation upon which to build. All in all, chapter 3 is quite important because it presents some of the building blocks for many of the later chapters. CHAPTER OUTLINE 3.1 Measures of Central Tendency: Ungrouped Data Mode Median Mean Percentiles Chapter 3: Descriptive Statistics 3 Quartiles 3.2 Measures of Variability - Ungrouped Data Range Interquartile Range Mean Absolute Deviation, Variance, and Standard Deviation Mean Absolute Deviation Variance Standard Deviation Meaning of Standard Deviation Empirical Rule Chebyshev’s Theorem Population Versus Sample Variance and Standard Deviation Computational Formulas for Variance and Standard Deviation Z Scores Coefficient of Variation 3.3 Measures of Central Tendency and Variability - Grouped Data Measures of Central Tendency Mean Mode Measures of Variability 3.4 Measures of Shape Skewness Skewness and the Relationship of the Mean, Median, and Mode Coefficient of Skewness Kurtosis Box and Whisker Plot 3.5 Measures of Association Correlation 3.6 Descriptive Statistics on the Computer KEY TERMS Arithmetic Mean Bimodal Box and Whisker Plot Chebyshev’s Theorem Coefficient of Correlation (r) Coefficient of Skewness Measures of Variability Median Mesokurtic Mode Multimodal Percentiles Chapter 3: Descriptive Statistics Coefficient of Variation (CV) Deviation from the Mean Empirical Rule Interquartile Range Kurtosis Leptokurtic Mean Absolute Deviation (MAD) Measures of Central Tendency Measures of Shape 4 Platykurtic Quartiles Range Skewness Standard Deviation Sum of Squares of x Variance z Score SOLUTIONS TO PROBLEMS IN CHAPTER 3 3.1 Mode 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9 The mode = 4 4 is the most frequently occurring value 3.2 Median for values in 3.1 Arrange in ascending order: 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9 There are 15 terms. Since there are an odd number of terms, the median is the middle number. The median = 4 Using the formula, the median is located at the n + 1 th term = 15+1 = 16 = 8th term 2 2 2 The 8th term = 4 3.3 Median Arrange terms in ascending order: 073, 167, 199, 213, 243, 345, 444, 524, 609, 682 There are 10 terms. Since there are an even number of terms, the median is the average of the two middle terms: Median = 243 + 345 = 588 = 294 2 2 Chapter 3: Descriptive Statistics 5 Using the formula, the median is located at the n + 1 th term. 2 n = 10 therefore 10 + 1 = 11 = 5.5th term. 2 2 The median is located halfway between the 5th and 6th terms. 5th term = 243 6th term = 345 Halfway between 243 and 345 is the median = 294 3.4 Mean 17.3 44.5 µ = x/N = 333.6/8 = 41.7 31.6 40.0 52.8 x = x/n = 333.6/8 = 41.7 38.8 30.1 78.5 (It is not stated in the problem whether the x = 333.6 data represent as population or a sample). 3.5 Mean 7 -2 5 9 0 -3 -6 -7 -4 -5 2 -8 x = -12 3.6 µ = x/N = -12/12 = -1 x = x/n = -12/12 = -1 (It is not stated in the problem whether the data represent a population or a sample). Rearranging the data into ascending order: 11, 13, 16, 17, 18, 19, 20, 25, 27, 28, 29, 30, 32, 33, 34 i 35 (15) 5.25 100 P35 is located at the 5 + 1 = 6th term Chapter 3: Descriptive Statistics 6 P35 = 19 i 55 (15) 8.25 100 P55 is located at the 8 + 1 = 9th term P55 = 27 Q1 = P25 i 25 (15) 3.75 100 Q1 = P25 is located at the 3 + 1 = 4th term Q1 = 17 Q2 = Median 15 1 th The median is located at the 8 term 2 th Q2 = 25 Q3 = P75 i 75 (15) 11.25 100 Q3 = P75 is located at the 11 + 1 = 12th term Q3 = 30 3.7 Rearranging the data in ascending order: 80, 94, 97, 105, 107, 112, 116, 116, 118, 119, 120, 127, 128, 138, 138, 139, 142, 143, 144, 145, 150, 162, 171, 172 n = 24 Chapter 3: Descriptive Statistics i 20 (24) 4.8 100 P20 is located at the 4 + 1 = 5th term P20 = 107 i 47 (24) 11.28 100 P47 is located at the 11 + 1 = 12th term P47 = 127 i 83 (24) 19.92 100 P83 is located at the 19 + 1 = 20th term P83 = 145 Q1 = P25 i 25 (24) 6 100 Q1 is located at the 6.5th term Q1 = (112 + 116)/ 2 = 114 Q2 = Median The median is located at the: 24 1 th 12.5 term 2 th Q2 = (127 + 128)/ 2 = 127.5 Q3 = P75 i 75 (24) 18 100 7 Chapter 3: Descriptive Statistics 8 Q3 is located at the 18.5th term Q3 = (143 + 144)/ 2 = 143.5 3.8 Mean = x 43,063 2,870.87 N 15 15 1 The median is located at the = 8th term 2 Median = 2,264 th Q2 = Median = 2,264 i 63 (15) 9.45 100 P63 is located at the 9 + 1 = 10th term P63 = 2,646 10 1 th The median is located at the 5.5 position 2 th 3.9 The median = (595 + 653)/2 = 624 Q3 = P75: i 75 (10) 7.5 100 P75 is located at the 7+1 = 8th term Q3 = 751 For P20: i 20 (10) 2 100 P20 is located at the 2.5th term P20 = (483 + 489)/2 = 486 For P60: i 60 (10) 6 100 Chapter 3: Descriptive Statistics 9 P60 is located at the 6.5th term P60 = (653 + 701)/2 = 677 For P80: i 80 (10) 8 100 P80 is located at the 8.5th term P80 = (751 + 800)/2 = 775.5 For P93: i 93 (10) 9.3 100 P93 is located at the 9+1 = 10th term P93 = 1096 3.10 n = 17 Mean = x 61 3.59 N 17 17 1 The median is located at the = 9th term 2 th Median = 4 Mode = 4 Q3 = P75: i= 75 (17) 12.75 100 Q3 is located at the 13th term Q3 = 4 P11: i= 11 (17) 1.87 100 Chapter 3: Descriptive Statistics 10 P11 is located at the 2nd term P11 = 1 P35: i= 35 (17) 5.95 100 P35 is located at the 6th term P35 = 3 P58: i= 58 (17) 9.86 100 P58 is located at the 10th term P58 = 4 P67: i= 67 (17) 11.39 100 P67 is located at the 12th term P67 = 4 3.11 x 6 2 4 9 1 3 5 x = 30 6-4.2857 = x-µ = x -µ 1.7143 2.2857 0.2857 4.7143 3.2857 1.2857 0.7143 14.2857 x 30 4.2857 N 7 a.) Range = 9 - 1 = 8 b.) M.A.D. = x N 14.2857 2.041 7 (x-µ)2 2.9388 5.2244 .0816 22.2246 10.7958 1.6530 .5102 2 (x -µ) = 43.4284 Chapter 3: Descriptive Statistics c.) 2 = ( x ) 2 43.4284 = 6.204 N 7 ( x ) 2 6.204 = 2.491 N = d.) e.) 1, 2, 3, 4, 5, 6, 9 Q1 = P25 i = 25 (7) = 1.75 100 Q1 is located at the 1 + 1 = 2th term, Q1 = 2 Q3 = P75: i = 75 (7) = 5.25 100 Q3 is located at the 5 + 1 = 6th term, Q3 = 6 IQR = Q3 - Q1 = 6 - 2 = 4 f.) 11 z = 6 4.2857 = 0.69 2.491 z = 2 4.2857 = -0.92 2.491 z = 4 4.2857 = -0.11 2.491 z = 9 4.2857 = 1.89 2.491 z = 1 4.2857 = -1.32 2.491 z = 3 4.2857 = -0.52 2.491 Chapter 3: Descriptive Statistics 5 4.2857 = 0.29 2.491 z = 3.12 x xx 4 3 0 5 2 9 4 5 0 1 4 1 2 5 0 1 x = 32 x 12 x x = 14 ( x x) 2 0 1 16 1 4 25 0 1 ( x x ) 2 = 48 x 32 =4 n 8 a) Range = 9 - 0 = 9 xx b) M.A.D. = n 14 = 1.75 8 ( x x) 2 48 c) s = = 6.857 n 1 7 2 d) s = ( x x ) 2 6.857 = 2.619 n 1 e) Numbers in order: 0, 2, 3, 4, 4, 5, 5, 9 Q1 = P25 i = 25 (8) = 2 100 Q1 is located at the average of the 2nd and 3rd terms, 2.5th term Q1 = 2.5 Q1 = (2 + 3)/2 = Chapter 3: Descriptive Statistics 13 Q3 = P75 i = 75 (8) = 6 100 Q3 is located at the average of the 6th and 7th terms Q3 = (5 + 5)/2 = 5 IQR = Q3 - Q1 = 5 - 2.5 = 2.5 3.13 a.) x 12 23 19 26 24 23 x = 127 = = (x-µ) 12-21.167= -9.167 1.833 -2.167 4.833 2.833 1.833 (x -µ) = -0.002 x 127 = 21.167 N 6 ( x ) 2 126.834 21.139 = 4.598 N 6 b.) x 12 23 19 26 24 23 x = 127 (x -µ)2 84.034 3.360 4.696 23.358 8.026 3.360 2 (x -µ) = 126.834 x2 144 529 361 676 576 529 2 x = 2815 ORIGINAL FORMULA Chapter 3: Descriptive Statistics 14 = (x) 2 N N x 2 = 4.598 (127) 2 6 6 2815 2815 2688.17 126.83 21.138 6 6 SHORT-CUT FORMULA The short-cut formula is faster. 3.14 s2 = 433.9267 s = 20.8309 x = 1387 x2 = 87,365 n = 25 x = 55.48 3.15 2 = 58,631.359 = 242.139 x = 6886 x2 = 3,901,664 n = 16 µ = 430.375 3.16 14, 15, 18, 19, 23, 24, 25, 27, 35, 37, 38, 39, 39, 40, 44, 46, 58, 59, 59, 70, 71, 73, 82, 84, 90 Q1 = P25 i = 25 ( 25) = 6.25 100 P25 is located at the 6 + 1 = 7th term Chapter 3: Descriptive Statistics 15 Q1 = 25 Q3 = P75 i = 75 ( 25) = 18.75 100 P75 is located at the 18 + 1 = 19th term Q3 = 59 IQR = Q3 - Q1 = 59 - 25 = 34 3.17 a) 1 1 1 3 1 .75 2 2 4 4 b) 1 1 1 1 .84 2 2.5 6.25 c) 1 1 1 1 .609 2 1.6 2.56 d) 1 1 1 1 .902 2 3.2 10.24 3.18 Set 1: 1 1 x 262 65.5 N 4 (x) 2 (262) 2 17,970 N 4 N 4 x 2 Set 2: 2 x 570 142.5 N 4 = 14.2215 Chapter 3: Descriptive Statistics 2 (x) 2 (570) 2 82,070 N 4 N 4 x 2 CV1 = 14.2215 (100) = 21.71% 65.5 CV2 = 14.5344 (100) = 10.20% 142.5 3.19 x xx 7 5 10 12 9 8 14 3 11 13 8 6 106 1.833 3.833 1.167 3.167 0.167 0.833 5.167 5.833 2.167 4.167 0.833 2.833 32.000 x x 106 = 8.833 n 12 xx a) MAD = 16 n 32 = 2.667 12 b) s2 = ( x x) 2 121.665 = 11.06 n 1 11 c) s = s 2 11.06 = 3.326 d) Rearranging terms in order: 3 5 6 7 8 8 9 10 11 12 13 14 = 14.5344 ( x x) 2 3.361 14.694 1.361 10.028 0.028 0.694 26.694 34.028 4.694 17.361 0.694 8.028 121.665 Chapter 3: Descriptive Statistics Q1 = P25: i = (.25)(12) = 3 Q1 = the average of the 3rd and 4th terms: Q1 = (6 + 7)/2 = 6.5 Q3 = P75: i = (.75)(12) = 9 Q3 = the average of the 9th and 10th terms: Q3 = (11 + 12)/2 = 11.5 IQR = Q3 - Q1 = 11.5 - 9 = 2.5 e.) z = 6 8.833 = - 0.85 3.326 f.) CV = (3.326)(100) = 37.65% 8.833 3.20 n = 11 x 768 429 323 306 286 262 215 172 162 148 145 x = 3216 x- 475.64 136.64 30.64 13.64 6.36 30.36 77.36 120.36 130.36 144.36 147.36 x-µ = 1313.08 µ = 292.36 x = 3216 x2 = 1,267,252 a.) range = 768 - 145 = 623 b.) MAD = x N 1313.08 = 119.37 11 17 Chapter 3: Descriptive Statistics (x) 2 (3216) 2 x 1,267,252 N 11 c.) 2 = N 11 18 2 d.) = = 29,728.23 29,728.23 = 172.42 e.) Q1 = P25: i = .25(11) = 2.75 Q1 is located at the 2 + 1 = 3rd term Q1 = 162 Q3 = P75: i = .75(11) = 8.25 Q3 is located at the 8 + 1 = 9th term Q3 = 323 IQR = Q3 - Q1 = 323 - 162 = 161 f.) xnestle = 172 Z = g.) CV = 3.21 x 172 292.36 = -0.70 172.42 172.42 (100) (100) = 58.98% 292.36 µ = 125 = 12 68% of the values fall within: µ ± 1 = 125 ± 1(12) = 125 ± 12 between 113 and 137 95% of the values fall within: µ ± 2 = 125 ± 2(12) = 125 ± 24 between 101 and 149 99.7% of the values fall within: Chapter 3: Descriptive Statistics 19 µ ± 3 = 125 ± 3(12) = 125 ± 36 between 89 and 161 3.22 =6 µ = 38 between 26 and 50: x1 - µ = 50 - 38 = 12 x2 - µ = 26 - 38 = -12 x1 12 = 2 6 x2 12 = -2 6 K = 2, and since the distribution is not normal, use Chebyshev’s theorem: 1 1 1 1 3 1 2 1 = .75 2 K 2 4 4 at least 75% of the values will fall between 26 and 50 between 14 and 62? µ = 38 =6 x1 - µ = 62 - 38 = 24 x2 - µ = 14 - 38 = -24 x1 24 = 4 6 x2 24 = -4 6 K=4 1 1 1 1 15 1 2 1 = .9375 2 K 4 16 16 at least 93.75% of the values fall between 14 and 62 between what 2 values do at least 89% of the values fall? Chapter 3: Descriptive Statistics 1- 20 1 =.89 K2 .11 = 1 K2 .11 K2 = 1 K2 = 1 .11 K2 = 9.09 K = 3.015 With µ = 38, = 6 and K = 3.015 at least 89% of the values fall within: µ ± 3.015 38 ± 3.015 (6) 38 ± 18.09 19.91 and 56.09 3.23 1- 1 = .80 K2 1 - .80 = 1 K2 .20 = 1 K2 .20K2 K2 = 5 =1 and K = 2.236 2.236 standard deviations 3.24 µ = 43. within 68% of the values lie µ + 1 1 = 46 - 43 = 3 Chapter 3: Descriptive Statistics 21 within 99.7% of the values lie µ + 3 3 = 51 - 43 = 8 = 2.67 µ = 28 and 77% of the values lie between 24 and 32 or + 4 from the mean: 1- 1 = .77 K2 k2 = 4.3478 k = 2.085 2.085 = 4 = 3.25 4 = 1.918 2.085 =4 µ = 29 x1 - µ = 21 - 29 = -8 x2 - µ = 37 - 29 = +8 x1 8 = -2 Standard Deviations 4 x2 8 = 2 Standard Deviations 4 Since the distribution is normal, the empirical rule states that 95% of the values fall within µ ± 2 . Exceed 37 days: Since 95% fall between 21 and 37 days, 5% fall outside this range. Since the normal distribution is symmetrical 2½% fall below 21 and above 37. Thus, 2½% lie above the value of 37. Exceed 41 days: Chapter 3: Descriptive Statistics x 22 41 29 12 = 3 Standard deviations 4 4 The empirical rule states that 99.7% of the values fall within µ ± 3 = 29 ± 3(4) = 29 ± 12 between 17 and 41, 99.7% of the values will fall. 0.3% will fall outside this range. Half of this or .15% will lie above 41. Less than 25: µ = 29 x = 4 25 29 4 = -1 Standard Deviation 4 4 According to the empirical rule: µ ± 1 contains 68% of the values 29 ± 1(4) = 29 ± 4 from 25 to 33. 32% lie outside this range with ½(32%) = 16% less than 25. 3.26 x 97 109 111 118 120 130 132 133 137 137 x = 1224 x2 = 151,486 Bordeaux: x = 137 z = 137 122.4 = 1.07 13.615 n = 10 x = 122.4 s = 13.615 Chapter 3: Descriptive Statistics 23 Montreal: x = 130 z = 130 122.4 = 0.56 13.615 Edmonton: x = 111 z = 111 122.4 = -0.84 13.615 Hamilton: x = 97 z = 3.27 97 122.4 = -1.87 13.615 Mean Class 0- 2 2- 4 4- 6 6- 8 8 - 10 10 - 12 12 - 14 = f 39 27 16 15 10 8 6 f=121 M 1 3 5 7 9 11 13 fM 39 81 80 105 90 88 78 fM=561 fM 561 = 4.64 f 121 Mode: The modal class is 0 – 2. The midpoint of the modal class = the mode = 1 3.28 Class 1.2 - 1.6 1.6 - 2.0 2.0 - 2.4 2.4 - 2.8 2.8 - 3.2 Mean: f 220 150 90 110 280 f=850 = fM 1902 = 2.24 f 850 M 1.4 1.8 2.2 2.6 3.0 fM 308 270 198 286 840 fM=1902 Chapter 3: Descriptive Statistics Mode: 3.29 The modal class is 2.8 – 3.2. The midpoint of the modal class is the mode = 3.0 Class f M fM 20-30 30-40 40-50 50-60 60-70 70-80 7 11 18 13 6 4 25 35 45 55 65 75 175 385 810 715 390 300 Total 59 = M-µ -22.0339 -12.0339 - 2.0339 7.9661 17.9661 27.9661 24 2775 fM 2775 = 47.034 f 59 (M - µ)2 f(M - µ)2 485.4927 3398.449 144.8147 1592.962 4.1367 74.462 63.4588 824.964 322.7808 1936.685 782.1028 3128.411 Total 10,955.933 f ( M ) 2 10,955.93 = = 185.694 f 59 2 3.30 = 185.694 Class 5- 9 9 - 13 13 - 17 17 - 21 21 - 25 f 20 18 8 6 2 f=54 = 13.627 M 7 11 15 19 23 fM 140 198 120 114 46 fM= 618 fM2 980 2,178 1,800 2,166 1,058 fm2= 8,182 (fM ) 2 (618) 2 fM 8182 n 54 8182 7071.7 = 20.9 s2 = n 1 53 53 2 Chapter 3: Descriptive Statistics S 2 20.9 = 4.57 s = 3.31 25 Class 18 - 24 24 - 30 30 - 36 36 - 42 42 - 48 48 - 54 54 - 60 60 - 66 66 - 72 f 17 22 26 35 33 30 32 21 15 f= 231 a.) Mean: x M 21 27 33 39 45 51 57 63 69 fM fM2 357 7,497 594 16,038 858 28,314 1,365 53,235 1,485 66,825 1,530 78,030 1,824 103,968 1,323 83,349 1,035 71,415 fM= 10,371 fM2= 508,671 fM fM 10,371 = 44.9 n f 231 b.) Mode. The Modal Class = 36-42. The mode is the class midpoint = 39 (fM ) 2 (10,371) 2 508,671 n 231 43,053.5 = 187.2 n 1 230 230 fM 2 c.) s2 = d.) s = 187.2 = 13.7 3.32 a.) Mean Class 0-1 1-2 2-3 3-4 4-5 5-6 = f 31 57 26 14 6 3 f=137 fM 258.5 = 1.89 f 137 M fM fM2 0.5 1.5 2.5 3.5 4.5 5.5 15.5 85.5 65.0 49.0 27.0 16.5 fM=258.5 7.75 128.25 162.50 171.50 121.50 90.75 2 fM =682.25 Chapter 3: Descriptive Statistics b.) Mode: 26 Modal Class = 1-2. Mode = 1.5 c.) Variance: 2 = (fM ) 2 (258.5) 2 682.25 N 137 N 137 fM 2 = 1.4197 d.) standard Deviation: = 3.33 f 20-30 8 30-40 7 40-50 1 50-60 0 60-70 3 70-80 1 f = 20 M 2 1.4197 = 1.1915 fM2 fM 25 200 5000 35 245 8575 45 45 2025 55 0 0 65 195 12675 75 75 5625 2 fM = 760 fM = 33900 a.) Mean: = fM 760 = 38 f 20 b.) Mode. The Modal Class = 20-30. The mode is the midpoint of this class = 25. c.) Variance: 2 = (fM ) 2 (760) 2 33,900 N 20 = 251 N 20 fM 2 d.) Standard Deviation: = 2 251 = 15.843 Chapter 3: Descriptive Statistics 3.34 No. of Farms f 0 - 20,000 20,000 - 40,000 40,000 - 60,000 60,000 - 80,000 80,000 - 100,000 100,000 - 120,000 16 11 10 6 5 1 f = 49 = M 10,000 30,000 50,000 70,000 90,000 110,000 27 fM 160,000 330,000 500,000 420,000 450,000 110,000 fM2 = 1,970,000 fM 1,970,000 = 40,204 f 49 The actual mean for the ungrouped data is 37,816. This computed group mean, 40,204, is really just an approximation based on using the class midpoints in the calculation. Apparently, the actual numbers of farms per state in some categories do not average to the class midpoint and in fact might be less than the class midpoint since the actual mean is less than the grouped data mean. fM2 = 1.185x1011 2 = (fm) 2 (1.97 x106 ) 2 1.18 x1011 N 49 = 801,999,167 N 49 fM 2 = 28,319.59 The actual standard deviation was 29,341. The difference again is due to the grouping of the data and the use of class midpoints to represent the data. The class midpoints due not accurately reflect the raw data. 3.35 mean = $35 median = $33 mode = $21 The stock prices are skewed to the right. While many of the stock prices are at the cheaper end, a few extreme prices at the higher end pull the mean. 3.36 mean = 51 median = 54 Chapter 3: Descriptive Statistics 28 mode = 59 The distribution is skewed to the left. More people are older but the most extreme ages are younger ages. 3.37 Sk = 3.38 n = 25 3( M d ) 3(5.51 3.19) 9.59 = 0.726 x = 600 x2 = 15,462 x = 24 Sk = s = 6.6521 3( x M d ) 3(24 23) = 0.451 s 6.6521 There is a slight skewness to the right 3.39 Q1 = 500. Median = 558.5. Q3 = 589. IQR = 589 - 500 = 89 Inner Fences: Q1 - 1.5 IQR = 500 - 1.5 (89) = 366.5 and Q3 + 1.5 IQR = 589 + 1.5 (89) = 722.5 Outer Fences: Q1 - 3.0 IQR = 500 - 3 (89) = 233 and Q3 + 3.0 IQR = 589 + 3 (89) = 856 The distribution is negatively skewed. There are no mild or extreme outliers. 3.40 n = 18 i = Q1 = P25: 25 (18) = 4.5 100 Q1 = 5th term = 66 Q3 = P75: i = 75 (18) = 13.5 100 Chapter 3: Descriptive Statistics Q3 = 14th term = 29 90 (n 1)th (18 1)th 19 = 9.5th term 2 2 2 th Median: Median = 74 IQR = Q3 - Q1 = 90 - 66 = 24 Inner Fences: Q1 - 1.5 IQR = 66 - 1.5 (24) = 30 Q3 + 1.5 IQR = 90 + 1.5 (24) = 126 Outer Fences: Q1 - 3.0 IQR = 66 - 3.0 (24) = -6 Q3 + 3.0 IQR = 90 + 3.0 (24) = 162 There are no extreme outliers. The only mild outlier is 21. The distribution is positively skewed since the median is nearer to Q1 than Q3. 3.41 x = 80 y2 = 815 r = x2 = 1,148 xy = 624 xy ( x) 2 x n 2 x y n ( y ) 2 y n (80)(69) 7 = 2 (80) (69) 2 1,148 815 7 7 624 r = r = y = 69 n=7 164.571 = -0.927 177.533 2 = 164.571 = (233.714)(134.857) Chapter 3: Descriptive Statistics 3.42 x = 1,087 y2 = 878,686 r = 30 x2 = 322,345 xy= 507,509 xy y = 2,032 n=5 x y n ( x) ( y ) 2 2 2 x y n n = 2 (1,087)( 2,032) 5 = 2 2 (1,087) (2,032) 322,345 878,686 5 5 507,509 r = r = 3.43 65,752.2 65,752.2 = = .975 67,449.5 (86,031.2)(52,881.2) Delta (x) SW (y) 47.6 46.3 50.6 52.6 52.4 52.7 15.1 15.4 15.9 15.6 16.4 18.1 x = 302.2 x2 = 15,259.62 r = xy y = 96.5 xy = 4,870.11 2 y = 1,557.91 x y n ( x) ( y ) 2 2 2 x y n n 2 = Chapter 3: Descriptive Statistics 31 (302.2)(96.5) 6 = .6445 2 (302.2) (96.5) 2 15,259.62 1,557.91 6 6 4,870.11 r = 3.44 x = 6,087 y = 1,050 xy = 1,130,483 r = x2 = 6,796,149 y2 = 194,526 n=9 xy ( x) 2 x n 2 x y n ( y ) 2 y n 2 = (6,087)(1,050) 9 = 2 (6,087) (1,050)2 6,796,149 194,526 9 9 1,130,483 r = r = 3.45 420,333 420,333 = = .957 439,294.705 (2,679,308)(72,026) Correlation between Year 1 and Year 2: x = 17.09 y = 15.12 xy = 48.97 r = x2 = 58.7911 y2 = 41.7054 n=8 xy x y n ( x) 2 ( y ) 2 2 2 x y n n = Chapter 3: Descriptive Statistics 32 (17.09)(15.12) 8 = 2 (17.09) (15.12) 2 58.7911 41.7054 8 8 48.97 r = r = 16.6699 16.6699 = = .975 17.1038 (22.28259)(13.1286) Correlation between Year 2 and Year 3: x = 15.12 x2 = 41.7054 y = 15.86 y2 = 42.0396 xy = 41.5934 n=8 r = xy x y n ( x) ( y ) 2 2 2 x y n n = 2 (15.12)(15.86) 8 = 2 2 (15.12) (15.86) 41.7054 42.0396 8 8 41.5934 r = r = 11.618 11.618 = = .985 11.795 (13.1286)(10.59715) Correlation between Year 1 and Year 3: x = 17.09 y = 15.86 xy = 48.5827 r = x2 = 58.7911 y2 = 42.0396 n=8 xy x y n ( x) ( y ) 2 2 2 x y n n 2 = Chapter 3: Descriptive Statistics 33 (17.09)(15.86) 8 2 (17.09) (15.86) 2 58 . 7911 42 . 0396 8 8 48.5827 r = r = 14.702 14.702 = = .957 15.367 (2.2826)(10.5972) The years 2 and 3 are the most correlated with r = .985. 3.46 Arranging the values in an ordered array: 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 6, 8 x 75 = 2.5 n 30 Mode = 2 (There are eleven 2’s) Mean: x Median: There are n = 30 terms. n 1 30 1 21 The median is located at = 15.5th position. 2 2 2 th Median is the average of the 15th and 16th value. However, since these are both 2, the median is 2. Range = 8 - 1 = 7 Q1 = P25: 25 (30) = 7.5 100 i = Q1 is the 8th term = 1 Q3 = P75: i = 75 (30) = 22.5 100 Chapter 3: Descriptive Statistics Q3 is the 23rd term = 3 IQR = Q3 - Q1 = 3 - 1 = 2 3.47 P10: i = P10 = 4.5th term = 10 ( 40) = 4 100 23 P80: i = P80 = 32.5th term = 80 ( 40) = 32 100 49.5 Q1 = P25: i = P25 = 10.5th term = 25 ( 40) = 10 100 27.5 Q3 = P75: i = P75 = 30.5th term = 75 ( 40) = 30 100 47.5 IQR = Q3 - Q1 = 47.5 - 27.5 = 20 Range = 81 - 19 = 62 34 Chapter 3: Descriptive Statistics 3.48 µ = 35 x 126,904 = 6345.2 N 20 The median is located at the (n+1)/2 th value = 21/2 = 10.5th value The median is the average of 5414 and 5563 = 5488.5 P30: i = (.30)(20) = 6 P30 is located at the average of the 6th and 7th terms P30 = (4507+4541)/2 = 4524 P60: i = (.60)(20) = 12 P60 is located at the average of the 12th and 13th terms P60 = (6101+6498)/2 = 6299.5 P90: i = (.90)(20) = 18 P90 is located at the average of the 18th and 19th terms P90 = (9863+11,019)/2 = 10,441 Q1 = P25: i = (.25)(20) = 5 Q1 is located at the average of the 5th and 6th terms Q1 = (4464+4507)/2 = 4485.5 Q3 = P75: i = (.75)(20) = 15 Q3 is located at the average of the 15th and 16th terms Q3 = (6796+8687)/2 = 7741.5 Range = 11,388 - 3619 = 7769 IQR = Q3 - Q1 = 7741.5 - 4485.5 = 3256 Chapter 3: Descriptive Statistics 3.49 x = 9,332,908 n = 10 36 x2 = 1.06357x1013 µ = (x)/N = 9,332,908/10 = 933,290.8 (x) 2 (9332908) 2 13 x 1.06357 x10 N 10 N 10 2 = = 438,789.2 3.50 a.) µ= x = 26,675/11 = 2425 N Median = 1965 b.) Range = 6300 - 1092 = 5208 Q3 = 2867 Q1 = 1532 c.) IQR = Q3 - Q1 = 1335 Variance: 2 = (x) 2 (26,675) 2 86,942,873 N 11 = 2,023,272.55 N 11 x 2 standard Deviation: = d.) 2 2,023,272.55 = 1422.42 Texaco: z = x 1532 2425 = -0.63 1422.42 ExxonMobil: z = x 6300 2425 = 2.72 1422.42 Chapter 3: Descriptive Statistics e.) 37 Skewness: Sk = 3( M d ) 3(2425 1965) = 0.97 1422.42 3.51 a.) x 20,310 = 2,031 n 10 Mean: Median: 1670 1920 = 1795 2 Mode: No Mode b.) Range: 3350 – 1320 = 2030 Q1: 1 (10) 2.5 4 Located at the 3rd term. Q1 = 1570 Q3: 3 (10) 7.5 4 Located at the 8th term. IQR = Q3 – Q1 = 2550 – 1570 = 980 x xx x x 3350 2800 2550 2050 1920 1670 1660 1570 1420 1320 1319 769 519 19 111 361 371 461 611 711 5252 1,739,761 591,361 269,361 361 12,321 130,321 137,641 212,521 373,321 505,521 3,972,490 xx MAD = n 2 5252 = 525.2 10 Q3 = 2550 Chapter 3: Descriptive Statistics s = x x s= s 2 441,387.78 = 664.37 38 2 2 n 1 3,972,490 = 441,387.78 9 c.) Pearson’s Coefficient of skewness: Sk = 3( x M d ) 3(2031 1795) = 1.066 s 664.37 d.) Use Q1 = 1570, Q2 = 1795, Q3 = 2550, IQR = 980 Extreme Points: Inner Fences: 1320 and 3350 1570 – 1.5(980) = 100 2550 + 1.5(980) = 4020 Outer Fences: 1570 + 3.0(980) = -1370 2550+ 3.0(980) = 5490 No apparent outliers 1500 2500 $ Millions 3500 Chapter 3: Descriptive Statistics 3.52 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 39 f M fM fM2 9 16 27 44 42 23 7 2 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 157.5 360.0 742.5 1430.0 1575.0 977.5 332.5 105.0 2756.25 8100.00 20418.75 46475.00 59062.50 41543.75 15793.75 5512.50 fM = 5680.0 fM2 = 199662.50 f = 170 a.) Mean: = fM 5680 = 33.412 f 170 Mode: The Modal Class is 30-35. The class midpoint is the mode = 32.5. b.) Variance: 2 = (fM ) 2 (5680) 2 199,662.5 N 170 = 58.139 N 170 fM 2 standard Deviation: = 3.53 Class 0 - 20 20 - 40 40 - 60 60 - 80 80 - 100 f 32 16 13 10 19 f = 90 2 58.139 = 7.625 M 10 30 50 70 90 fM 320 480 650 700 1,710 fM= 3,860 fM2 3,200 14,400 32,500 49,000 153,900 fM2= 253,000 Chapter 3: Descriptive Statistics 40 a) Mean: x fM fm 3,860 = 42.89 n f 90 Mode: The Modal Class is 0-20. The midpoint of this class is the mode = 10. b) standard Deviation: (fM ) 2 n n 1 fM 2 s = 253,000 89 (3860) 2 90 87,448.9 982.571 89 = 31.346 3.54 x = 36 y = 44 xy = 188 r = x2 = 256 y2 = 300 n =7 xy x y n ( x) ( y ) 2 2 2 x y n n = 2 (36)( 44) 7 = 2 2 (36) (44) 256 300 7 7 188 r = r = 38.2857 38.2857 = = -.940 40.7441 (70.8571)( 23.42857) 253,000 165,551.1 89 Chapter 3: Descriptive Statistics 3.55 41 CVx = x 3.45 (100%) (100%) = 10.78% x 32 CVY = y 5.40 (100%) (100%) = 6.43% y 84 stock X has a greater relative variability. 3.56 µ = 7.5 From the Empirical Rule: 99.7% of the values lie in µ + 3 = 7.5 + 3 3 = 14 - 7.5 = 6.5 = 2.167 suppose that µ = 7.5, = 1.7: 95% lie within µ + 2 = 7.5 + 2(1.7) = 7.5 + 3.4 Between 4.1 and 10.9 3.57 µ = 419, = 27 a.) 68%: µ + 1 419 + 27 392 to 446 95%: µ + 2 419 + 2(27) 365 to 473 99.7%: µ + 3 419 + 3(27) 338 to 500 b.) Use Chebyshev’s: The distance from 359 to 479 is 120 µ = 419 The distance from the mean to the limit is 60. k = (distance from the mean)/ = 60/27 = 2.22 Proportion = 1 - 1/k2 = 1 - 1/(2.22)2 = .797 = 79.7% Chapter 3: Descriptive Statistics 42 400 419 = -0.704. This worker is in the lower half of 27 workers but within one standard deviation of the mean. c.) x = 400. z = 3.58 x x2 Albania 1,650 2,722,500 Bulgaria 4,300 18,490,000 Croatia 5,100 26,010,000 Germany 22,700 515,290,000 x=33,750 x2= 562,512,500 = = x 33,750 = 8,437.50 N 4 (x) 2 (33,750) 2 562,512,500 N 4 N 4 x 2 = 8332.87 b.) x x2 Hungary 7,800 60,840,000 Poland 7,200 51,840,000 Romania 3,900 15,210,000 Bosnia/Herz 1,770 3,132,900 2 x=20,670 x =131,022,900 = = x 20,670 = 5,167.50 N 4 (x) 2 (20,670) 2 131,022,900 N 4 N 4 x 2 = 2,460.22 c.) CV1 = 1 8,332.87 (100) (100) = 98.76% 1 8,437.50 CV2 = 2 2,460.22 (100) (100) = 47.61% 2 5,167.50 The first group has a much larger coefficient of variation Chapter 3: Descriptive Statistics 3.59 Mean Median Mode 43 $35,748 $31,369 $29,500 since these three measures are not equal, the distribution is skewed. The distribution is skewed to the right. Often, the median is preferred in reporting income data because it yields information about the middle of the data while ignoring extremes. 3.60 x = 36.62 y = 57.23 xy = 314.9091 r = x2 = 217.137 y2 = 479.3231 n =8 xy ( x) 2 x n 2 x y n ( y ) 2 2 y n = (36.62)(57.23) 8 = 2 (36.62) (57.23) 2 217.137 479.3231 8 8 314.9091 r = r = 52.938775 = .90 (49.50895)(69.91399) There is a strong positive relationship between the inflation rate and the thirty-year treasury yield. 3.61 a.) Q1 = P25: i = 25 ( 20) = 5 100 Q1 = 5.5th term = (42.3 + 45.4)/2 = 43.85 Q3 = P75: Chapter 3: Descriptive Statistics i = 44 75 ( 20) = 15 100 Q3 = 15.5th term = (69.4 +78.0)/2 = 73.7 Median: n 1 20 1 = 10.5th term 2 2 th th Median = (52.9 + 53.4)/2 = 53.15 IQR = Q3 - Q1 = 73.7 – 43.85 = 29.85 1.5 IQR = 44.775; 3.0 IQR = 89.55 Inner Fences: Q1 - 1.5 IQR = 43.85 – 44.775 = - 0.925 Q3 + 1.5 IQR = 73.7 + 44.775 = 118.475 Outer Fences: Q1 - 3.0 IQR = 43.85 – 89.55 = - 45.70 Q3 + 3.0 IQR = 73.7 + 89.55 = 163.25 b.) and c.) There are no outliers in the lower end. There is one extreme outlier in the upper end (214.2). There are two mile outliers at the upper end (133.7 and 158.8). since the median is nearer to Q1, the distribution is positively skewed. d.) There are three dominating, large ports Displayed below is the MINITAB boxplot for this problem. Chapter 3: Descriptive Statistics 20 120 220 Tonnage 3.62 Paris: 1 - 1/k2 = .53 k = 1.459 The distance from µ = 349 to x = 381 is 32 1.459 = 32 = 21.93 Moscow: 1 - 1/k2 = .83 k = 2.425 The distance from µ = 415 to x = 459 is 44 2.425 = 44 = 18.14 45