4-1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4-2 When you have completed this chapter, you will be able to: 1. Compute and interpret the range, the mean deviation, the variance, the standard deviation, and the coefficient of variation of ungrouped data 2. Compute and interpret the range, the variance, and the standard deviation from grouped data 3. Explain the characteristics, uses, advantages, and disadvantages of each measure Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4-3 4. Understand Chebyshev’s theorem and the normal or empirical rule, as it relates to a set of observations 5. Compute and interpret percentiles, quartiles and the interquartile range 6. Construct and interpret box plots 7. Compute and describe the coefficient of skewness and kurtosis of a data distribution Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Terminology Range …is the difference between the largest and the smallest value. Only two values are used in its calculation. It is influenced by an extreme value. It is easy to compute and understand. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4-4 Terminology 4-5 Mean Deviation …is the arithmetic mean of the absolute values of the deviations from the arithmetic mean. MD x N All values are used in the calculation. It is not unduly influenced by large or small values. The absolute values are difficult to manipulate. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4-6 The weights of a sample of crates containing books for the bookstore (in kg) are: 103 97 101 106 103 Find the range and the mean deviation. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4-7 103 97 101 106 Find the mean weight 103 x 510 102 N 5 Find the mean deviation 103 102 + ... + 103 102 5 Find the range Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. MD x 1+ 5 +1+ 4 + 5 5 106 – 97 = 9 N = 2.4 4-8 Terminology Variance …is the arithmetic mean of the squared deviations from the arithmetic mean. All values are used in the calculation. It is not influenced by extreme values. The units are awkward…the square of the original units. Computation Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Computing the Variance Formula Formula s 2 … for a Population 2 ( x ) N 2 … for a Sample ( x x ) n 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 2 4-9 4 - 10 The ages of the Dunn family are: 2, 18, 34, 42 What is the population mean and variance? x 96 24 4 N 2 (x ) N 2 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 2 24 2 944 4 236 + ... + 42 24 4 2 Population Standard Deviation 4 - 11 … is the square root of the population variance From previous example… 2 236 = 15.36 Example Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 12 EXAMPLE The hourly wages earned by a sample of five students are: $7, $5, $11, $8, $6. Find the mean, variance, and Standard Deviation. x N 37 5 2 7 . 4 2 + ... + 6 7 . 4 2 21.2 7 ( x x ) s2 5 1 5-1 n 1 s= Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. s2 5.29 = 7.40 = 5.30 = 2.30 The Mean of Grouped Data 4 - 13 From chapter 3…. A sample of ten movie theatres in a metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing per theatre. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The Mean fx x of Grouped Data N Continued… Class (f)(x) Midpoint Movies Showing Frequency 1 to under 3 1 2 2 3 to under 5 2 4 8 5 to under 7 3 6 18 7 to under 9 1 8 8 9 to under 11 3 10 30 Total 10 f Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 14 66 The Mean fx x of Grouped Data N Movies Showing Frequency Total 10 f Formula Continued… Class Midpoint Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. (f)(x) 66 fx x N Now: Compute the variance and standard deviation. 4 - 15 66 10 = 6.6 Sample Variance for Grouped Data 4 - 16 The formula for the sample variance for grouped data is: 2 ( f x ) fx 2 n 2 s n1 f is class frequency and X is class midpoint Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 17 Sample Variance for Grouped Data Frequency 1 to under 3 1 2 2 4 3 to under 5 2 4 8 32 5 to under 7 3 6 18 108 7 to under 9 1 8 8 64 9 to under 11 3 10 30 300 Total 10 66 508 f Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Class (f)(x) Midpoint (x2)f Movies Showing Sample Variance for Grouped Data Movies Showing Frequency Total 10 f 4 - 18 Class (f)(x) Midpoint 66 (x2)f 508 2 ( f x ) fx 2 n 2 s n1 The variance is 2 66 = 508 - 10 9 = 8.04 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The standard deviation is 8.04 = 2.8 Interpretation and Uses of the Standard Deviation Chebyshev’s Theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least: Formula 1 1 k2 where k2 is any constant greater than 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 19 4 - 20 Suppose that a wholesale plumbing supply company has a group of 50 sales vouchers from a particular day. The amount of these vouchers are: How well does this data set fit Chebychev’s Theorem? Solution Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 21 Solution (continued) Using Step 1 Step 2 Mean = $319 SD = $101.78 Determine the mean and standard deviation of the sample Input k =2 into Chebyshev’s theorem 1- 1 22 = 1 – ¼ = 3/4 i.e. At least .75 of the observations will fall within 2SDof the mean. Step 3 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 22 Solution (continued) Step 3 Using the mean and SD, find the range of data values within 2 SD of the mean Mean = $319 SD = $101.78 ( - 2S, + 2S) = 319 - (2)101.78, 319 +2(101.78) = (115.44, 522.56) x x Now, go back to the sample data, and see what proportion of the values fall between 115.44 and 522.5656 Proportion Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Solution (continued) 4 - 23 Proportion of the values that fall between 115.44 and 522.56 We find that 48-50 or 96% of the data values are in this range – certainly at least 75% as the theorem suggests! Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Interpretation and Uses of the Standard Deviation 4 - 24 Empirical Rule: For any symmetrical, bell-shaped distribution: …About 68% of the observations will lie within 1s of the mean …About 95% of the observations will lie within 2s of the mean …Virtually all the observations will be within 3s of the mean Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Bell-Shaped Curve …showing the relationship between and 3 + 3 2 +2 1 +1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 25 4 - 26 Suppose that a wholesale plumbing supply company has a group of 50 sales vouchers from a particular day. The amount of these vouchers are: How well does this data set fit the Empirical Rule? Solution Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Solution 4 - 27 First check if the histogram has an approximate mound-shape Not bad…so we’ll proceed! We need to calculate the mean and standard Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. deviation Mean: $319 Standard Deviation: $101.78 4 - 28 Calculate the intervals: ( x s , x + s ) = (319-101.78, 319+101.78) (217.22, 420.78) ( x 2 s , x + 2 s ) = 319 -(2)101.78, 319 +2(101.78) =(115.44, 522.56) ( x 3 s , x +3 s) = 319-(3)101.78, 319 + 3(101.78) = (13.66, 624.34) Interval Empirical Rule Actual # values 217.22, 420.78 68% 31/50 115.44, 522.56 95% 48/50 13.66, 624.34 100% 49/50 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Actual percentage 62% 96% 98% 4 - 29 Skewness …is the measurement of the lack of symmetry of the distribution …The coefficient of skewness can range from -3.00 up to +3.00 …A value of 0 indicates a symmetric distribution. It is computed as follows: SK = 3 Mean Median 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. σ 4 - 30 Skewness SK1 = 3 Mean Median σ Following are the earnings per share for a sample of 15 software companies for the year 2000. The earnings per share are arranged from smallest to largest. $0.09 3.50 Find the coefficient of skewness. 0.13 0.41 0.51 6.36 8.92 10.13 12.99 16.40 7.83 1.12 1.20 1.49 3.18 Mean = 4.95 SK = 3(4.95-3.18)/5.22 1 Median = 3.18 = 1.017 SD = 5.22 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Positively Skewed Distribution Mean and Median are to the right of the Mode Mode< Median< Mean Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 31 Negatively Skewed Distribution Mean and Median are to the left of the Mode < Mode < Median Mean Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 32 4 - 33 Interquartile Range …is the distance between the third quartile Q3 and the first quartile Q1. This distance will include the middle 50 percent of the observations. Interquartile Range = Q3 - Q1 Example Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Example 4 - 34 For a set of observations the third quartile is 24 and the first quartile is 10. What is the interquartile range? The interquartile range is 24 - 10 = 14. Fifty percent of the observations will occur between 10 and 24. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 35 Box Plots …is a graphical display, based on quartiles, that helps to picture a set of data Five pieces of data are needed to construct a box plot: … the Minimum Value, … the First Quartile, … the Median, … the Third Quartile, and … the Maximum Value Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Example Example 4 - 36 Based on a sample of 20 deliveries, Buddy’s Pizza determined the following information. The…minimum delivery time was 13minutes …the maximum 30 minutes The…first quartile was 15 minutes …the median 18 minutes, and … the third quartile 22 minutes Develop a box plot for the delivery times. Solution Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Solution Min. Q1 Median 12 14 16 18 4 - 37 Q3 20 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 22 Max. 24 26 28 30 32 4 - 38 The following are the average rates of return for Stocks A and B over a six year period, In which of the following Stocks would you prefer to invest? Why? Stock A: 7 6 8 5 7 3 Stock B: 15 -10 18 10 -5 8 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 39 Find the Mean rate of return for each of the two stocks: Stock A: 7 6 8 5 7 3 Mean = 36/6 = 6 Stock B: 15 -10 18 10 -5 8 Mean = Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 36/6 = 6 4 - 40 Find the Range of Values of each stock: Stock A: 7 6 8 5 7 3 8–3=5 Stock B: 15 -10 18 10 -5 8 18 – ( -10) = 28 Therefore, Stock B is riskier. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Relative Dispersion 4 - 41 The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: x CV s (100%) A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500! Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 42 Example Rates of return over the past 6 years for two mutual funds are shown below. Fund A: 8.3, -6.0, 18.9, -5.7, 23.6, 20 Fund B: 12, -4.8, 6.4, 10.2, 25.3, 1.4 Which one has a higher level of risk? Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Solution 4 - 43 Solution Fund A Fund B Mean 9.85 Mean Let us use Standard Error 5.38 Standard Error the Excel Median 13.60 Median printout Mode #N/A Mode that is run Standard Deviation 13.19 Standard Deviation 173.88 Sample Variance from the Sample Variance -2.21 Kurtosis “Descriptive Kurtosis -0.44 Skewness Statistics” Skewness 29.60 Range sub-menu Range Minimum -6 Minimum Maximum 23.6 Maximum Sum 59.1 Sum Count 6 Count Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 8.42 4.20 8.30 #N/A 10.29 105.81 0.90 0.61 30.1 -4.8 25.3 50.5 6 4 - 44 Solution Is Fund A riskier because its standard deviation is larger? Fund A Fund B Mean 9.85 Mean Standard Error 5.38 Standard Error Median 13.60 Median Mode #N/A Mode Standard Deviation 13.19 Standard Deviation Sample Variance 173.88 Sample Variance Kurtosis -2.21 Kurtosis Skewness -0.44 Skewness Range 29.60 Range Minimum -6 Minimum Maximum 23.6 Maximum Sum 59.1 Sum Count 6 Count Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 8.42 4.20 8.30 #N/A 10.29 105.81 0.90 0.61 30.1 -4.8 25.3 50.5 6 4 - 45 Solution But the means of the two funds are different. Fund A Fund B Mean 9.85 Mean 8.42 Standard Error 5.38 Standard Error 4.20 Median 13.60 Median 8.30 Mode #N/A Mode #N/A Standard Deviation 13.19 Standard Deviation 10.29 Sample Variance 173.88 Sample Variance 105.81 Kurtosis Kurtosis 0.90 Fund A has a -2.21 higher rate of return, Skewness but it also -0.44 0.61 hasSkewness a larger sd. Range 29.60 Range 30.1 Therefore we need to compare the Minimum -6 Minimum -4.8 relative Maximum 23.6variability Maximum 25.3 Sum using the coefficient 59.1 Sum of variation. 50.5 Count 6 Count 6 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 46 CV s x Solution (100%) Fund A: CV = 13.19 / 9.85 = 1.34 Fund B: CV = 10.29 / 8.42 = 1.22 So now we say that there is more variability in Fund A as compared to Fund B Therefore, Fund A is riskier. Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Test your learning… www.mcgrawhill.ca/college/lind Online Learning Centre for quizzes extra content data sets searchable glossary access to Statistics Canada’s E-Stat data …and much more! Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 4 - 47 4 - 48 This completes Chapter 4 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.