STATISTICS FOR MANAGEMENT-I LEARNING OBJECTIVES: To use summary statistics to describe collections of data. To use the mean, median, and mode to describe how data bunch up. To use the range, variance, and standard deviation to describe how data spread out. INSTRUCTOR’S NAME: CHAPTER 3 Summary Statistics: In some cases frequency distribution does not give enough information about data, especially when we want to compare two or more data sets. In such cases we use summary statistics. Single numbers that explain certain qualities of a data set called Summary statistics. There are four main qualities of data that give useful informations about data. 1 - Central tendency 2 - Dispersion 3 - Kurtosis 4 - Skewness 1- Central Tendency: The tendency of values to cluster in the central part of data is called central tendency. A measure of central tendency is also called measures of location. The main measure Of central tendency are, i) ii) iii) iv) v) The Arithmetic Mean. The Weighted Mean. The Geometric Mean. The Median. The Mode. i) The arithmetic mean is commonly known as average. It is calculated as, Arithmetic Mean = sum of all values Total number of values Advantages of the arithmetic mean Disadvantages of the arithmetic mean 1-It is easy to calculate. 1-It is affected by extreme values. 2-It is based on all observations. 2-It cannot be computed if any value is missing. 3-It is finite. 3-It is not good for highly skewed distribution. The Arithmetic Mean for ungrouped data. Population arithmetic mean = Sample arithmetic mean = µ = x = ∑𝒙 𝑵 ∑𝒙 𝒏 EX: Marks of seven students of a class in a certain test are given below. Find the mean of their marks. 9 7 8 6 4 4 5 Solution: n = 7 x = ∑x n 𝟗 + 𝟕 + 𝟖 + 𝟔 + 𝟒 + 𝟒 + 𝟓 = 𝟕 𝟒𝟑 = 𝟕 = 6.14 EX: Child-Care Community Nursery is eligible for a country social services grant as long as the average age of its children stays below 9.If these data represent the ages of all the children currently attending the Child –Care, do they qualify for the grant? 8 5 9 10 9 12 7 12 13 7 8 Solution: n = x = 11 ∑x n = 8+5+9+10+9+12+7+12+13+7+8 11 100 = 11 = 9.1 9.1 > 9, so they do not qualify for the grant. As H.W: Do EX 3-12 (Pg 81) The Arithmetic Mean for grouped data: We can calculate arithmetic mean for grouped data by using following formulas, ∑ 𝒇𝒙 a) x = b) x = 𝑥 + 𝑤 ∑𝒇 (simple method) ∑ 𝒇𝒖 ∑𝒇 (coding method) EX: The following frequency distribution shows the hourly income of 100 household in a locality. a) Find the sample mean. b) Find the sample mean using coding method with 0 assigned to the middle class. classes Frequency(f) 35-----39 40-----44 45---- 49 50-----54 55-----59 60-----64 65--- 69 13 15 28 17 12 10 5 Solution: (a) classes Midpoint(x) Frequency(f) 35-----39 40-----44 45---- 49 50-----54 55-----59 60-----64 65--- 69 37 42 47 52 57 62 67 13 15 28 17 12 10 5 481 630 1316 884 684 620 335 100 4950 ∑ 𝑓 = 100 ∑ 𝑓𝑥 = 4950 x = = ∑ 𝒇𝒙 ∑𝒇 4950 100 = 49.5 fx (b): classes Midpoint(x) Frequency(f) 35-----39 40-----44 45---- 49 50-----54 55-----59 60-----64 65--- 69 37 42 47 52 57 62 67 13 15 28 17 12 10 5 100 𝑥0 = 52 ∑𝑓 = ∑ 𝑓𝑢 = W = x 100 -50 𝟔𝟗−𝟑𝟓 𝟕 𝟑𝟒 = = 𝑥0 + 𝑤 𝟕 ∑ 𝒇𝒖 ∑𝒇 = 52 + 5( = 52 + = 4.8 = 5 −𝟓𝟎 𝟏𝟎𝟎 ) (−𝟐𝟓𝟎) 𝟏𝟎𝟎 = 52 - 2.50 = 49.5 u fu -3 -2 -1 0 1 2 3 -39 -30 -28 0 12 20 15 -50 EX: (EX Sc 3-1 pg 79) a) Find the sample mean. b) Find the sample mean using coding method with 0 assigned to the fourth class. classes 10.0-----10.9 11.0--- -11.9 12.0---- 12.9 13.0---- 13.9 14.0---- 14.9 15.0---- 15.9 16.0---- 16.9 17.0--- 17.9 18.0---- 18.9 19.0----19.9 Frequency (f) 1 4 6 8 12 11 8 7 6 2 Solution: (a) classes 10.0-----10.9 11.0--- -11.9 12.0---- 12.9 13.0---- 13.9 14.0---- 14.9 15.0---- 15.9 16.0---- 16.9 17.0--- 17.9 18.0---- 18.9 19.0----19.9 Midpoint (x) 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Frequency (f) 1 4 6 8 12 11 8 7 6 2 65 fx 10.5 46.0 75.0 108.0 174.0 170.5 132.0 122.5 111.0 39.0 988.5 ∑𝑓 = 65 ∑ 𝑓𝑥 = 988.5 x ∑ 𝑓𝑥 = ∑𝑓 = 988.5 = 15.207 65 (b): classes Midpoint (x) 10.0-----10.9 11.0--- -11.9 12.0---- 12.9 13.0---- 13.9 14.0---- 14.9 15.0---- 15.9 16.0---- 16.9 17.0--- 17.9 18.0---- 18.9 19.0----19.9 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Frequency (f) u 1 4 6 8 12 11 8 7 6 2 -3 -2 -1 0 1 2 3 4 5 6 65 𝑥0 = 13.5 ∑𝑓 = ∑ 𝑓𝑢 = W = 65 111 𝟏𝟗.𝟗−𝟏𝟎.𝟎 𝟏𝟎 = 𝟗.𝟗 𝟏𝟎 = 9.9 = 1.0 fu -3 -8 -6 0 12 22 24 28 30 12 111 x = 𝑥0 + w ∑ 𝒇𝒖 ∑𝒇 = 13.5 + 1( = 13.5 + 𝟏𝟏𝟏 𝟔𝟓 ) (𝟏𝟏𝟏) 𝟔𝟓 = 13.5 +1.707 = 15.207 H.W: Do EX 3-9 (Pg 80) ii) One of the limitation of arithmetic mean is that it gives equal importance to all values in a set of data. Some time different values do not have equal importance .Due to some reasons they have greater importance. That relative importance is called weight of those values. So the average in which each value is weighted by some index of its importance is called weighted mean. The Weighted Mean for ungrouped data. The weighted mean is calculated as, xW Here xW ∑𝑤 = ∑ 𝑤𝑥 ∑𝑤 = Symbol of weighted mean. = Sum of all weights. When we use weighted mean: If values in the sample do not appear with same frequency then we use weighted mean. EX: A student’s marks in Mathematics, Physics, English and Statistics are 82, 86, 90 and 70 respectively. If the respective credits received for These courses are 3, 5, 3 and 1.Calculate the average marks. Solution: 𝑥 w 82 86 90 70 3 5 3 1 246 430 270 70 12 1016 ∑𝑤 = ∑ 𝑤𝑥 = xW 𝑤𝑥 12 1016 ∑ 𝑤𝑥 = ∑𝑤 1016 = 12 = 84.67 EX: A contractor employs male, female and children. The number of male female and children are 20, 15 and5 respectively. He pays them $38, $35 and$20 per day respectively. What is average wage per day paid by contractor. Solution: 𝑤𝑥 Wages(𝑥) Workers(w) 38 35 20 20 15 5 760 525 100 40 1385 ∑𝑤 = 40 ∑ 𝑤𝑥 = 1385 ∑ 𝑤𝑥 xW =∑ 1385 = H.W: 𝑤 40 = 34.62 Do EX Sc 3-4(pg 85) iii) The geometric mean is appropriate to average ratios and rates of change. The Geometric Mean for ungrouped data. G.M = 𝑛√𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑥 𝑣𝑎𝑙𝑢𝑒𝑠 OR G.M ∑ log 𝑥 = anti-log ( 𝑛 ) EX: Calculate geometric mean for the given data, 45 32 37 46 39 36 41 48 36 Solution: Values(x) Log x 45 32 37 46 39 36 41 48 36 1.6532 1.5051 1.5682 1.6628 1.5911 1.5563 1.6128 1.6812 1.5563 14.387 G.M ∑ log 𝑥 = anti-log( 𝑛 ) 14.387 = anti-log ( 9 ) = anti-log (1.5986) = 39.68 EX: The number of cars crossing a certain bridge in big city in 10 intervals of 5 minutes each were recorded as follow. Calculate the geometric mean. 25 15 18 30 20 20 12 9 16 15 Solution: Values(x) Log x 25 15 18 30 20 20 12 9 16 15 1.3979 1.1716 1.2553 1.4771 1.3010 1.3010 1.0792 0.9542 1.2041 1.1716 12.3220 G.M ∑ log 𝑥 = anti-log( ) 𝑛 12.3220 = anti-log ( 10 ) = anti-log (1.2322) = 17.07 iv) The median is the single value from data set that measures the central value in the data . Median divides the data set in to two halves. When we use median: When data is not symmetrical (i.e. skewed) we use the median as measure of central tendency. Advantages of median Disadvantages of median 1-It is easy to calculate. 1-It is not capable of further Mathematical treatment. 2-It is necessary to arrange the data in to array. 3-It does not use all the values. 2-It is good for skewed distribution. 3-It is not affected by extreme values The median for ungrouped data: To find the median we first arrange the data in to array. Then we find median as, If data set contains odd number of values, the middle value of the array is the median. If data set contains even number of values, the average of two middle values of the array is the median. We can calculate median as, ~ x = ( 𝑛+1 2 ) th value EX : Find the median for the given data. 4 9 12 8 6 29 16 Solution: 4 6 ~ x 8 = ( = ( = 9 𝑛+1 2 7+1 2 8 2 12 16 )th value )th value th value = 4th value = 9 29 EX: Find the median for the given data. 4 5 9 3 8 10 Solution: 3 4 ~ x 5 = ( = 8 𝑛+1 ~ x So 6+1 2 7 = )th value thvalue 2 5+8 = 10 )th value 2 ( 9 = 2 3.5th value = 13 = 2 6.5 EX : Find the median for the given data, 86 52 49 31 30 11 35 43 Solution: 11 30 ~ x 31 ~ x 𝑛+1 = ( = ( = So 35 = 2 43 49 52 )th value 8+1 9 2 2 ) th value th value = 35+43 2 = 4.5th value 78 = 39 2 H.W: Find the median for the given data. a) b) 15 20 20 10 10 95 50 34 30 15 70 60 25 86 Median for grouped data: We can find the median for grouped data as, ~ x = ~ x Here l n w f c = = = = = = l + 𝑤 𝑓 ( 𝑛 2 - c) Sample median. Lower limit of median class. Sum of all frequencies. Width of median class. Frequency of median class. Cumulative frequency of class before median class. EX : Find median from the following grouped data regarding height of Students in a college. Heights Number of students = f 56-----58 58-----60 60- ---62 62- ---64 64----66 66----68 25 40 250 130 60 20 525 Solution: Heights 56-----58 58-----60 60- ---62 62- ---64 64----66 66----68 Number of students = f 25 40 250 130 60 20 525 Cumulative frequencies=cf 25 65 315 445 505 525 n 𝑛 2 l w f n c = 525 = = = = = = ~ x 525 2 =262.5 60 62 – 60 = 2 250 525 65 = l + = 60 + = 60 + = 60 + 𝑤 𝑓 2 ( 250 2 250 395 𝑛 2 - c) ( 262.5 - 65) (197.5) 250 = 60 + 1.58 = 61.58 EX : Find the median for the following frequency distribution. classes 100-------149.5 150-------199.5 200-------249.5 250-------299.5 300-------349.5 350-------399.5 400-------449.5 450-------499.5 Frequency(f) 12 14 27 28 72 63 36 18 Solution: classes Frequency(f) 100-------149.5 150-------199.5 200-------249.5 250-------299.5 300-------349.5 350-------399.5 400-------449.5 450-------499.5 Cumulative frequency ( cf) 12 14 27 28 72 63 36 18 12 26 53 81 153 216 252 270 270 n = 270 𝑛 = 2 l = 270 2 = 135 300 w = 349.5 - 300 = 49.5 = 50 f = 72 ~ x c = 81 = l = 300 + + = 300 + = 300 + 𝑤 𝑓 50 72 50 72 2700 ( 𝑛 2 - c) (135 - 81) (54) 72 = 300 + 37.5 = 337.5 H.W: Calculate the median profit for 1400 companies for the year 1999-2000. classes 200-------400 400-------600 600-------800 800-----1000 1000-----1200 1200----1400 1400----1600 f 120 300 500 280 100 80 20 IV ) It is the simplest measure of central tendency. It is the French word meaning ‘fashion’. It is the value which occurs most frequently in a set of data. It is easier to arrange the data in to array then find mode. Some time there are more than one mode in a set of data. For example, the data 2,3,4,5,4,7,7 has two modes i.e.4and7. The distribution which has two modes is called bimodal distribution. Some time there is no mode in the data set. For example, the data 34, 56,78,96,20 has no mode as each value occurs same number of times. Advantages of mode 1-It is easy to calculate Disadvantages of mode 1-It is not based on all values. 2-It can be calculated for open-end classes. 2-It is not capable for further Mathematical treatment. 3-It is not affected by extreme values 3-When data has more than one mode, it should not be calculated. The mode for ungrouped data: EX : Find the mode for the given data, 2 5 6 9 5 8 Solution: 2 5 Mode 5 5 6 6 5 8 9 = 5 EX : Find the mode for the given data, 10 8 10 8 3 6 8 Solution: 3 10 6 6 8 8 8 10 10 10 12 10 10 12 Mode = 10 The mode for grouped data: In grouped data we assume that mode is located in class with highest frequency. We can calculate the mode for grouped data as, Mo = Lmo Here Lmo + ( d1 d1+ d2 )w = lower limit of model class. d1 = frequency of model class – frequency of class directly below it. d2 = frequency of model class - frequency of class directly above it. W = width of model class. EX : Find the mode for the given frequency distribution Solution: classes Frequency=f 200------400 400-------600 600-------800 800------1000 1000-----1200 1200-----1400 1400-----1600 120 300 500 280 100 80 20 Lmo = 600 W = 800 – 600 = 200 d1 = 500 – 300 = 200 d2 = 500 – 280 = 220 Mo = Lmo + ( = 600 + ( = 600 + ( = 600 + d1 d1+ d2 200 200+220 200 420 )w ) 200 ) 200 40000 420 = 600 + 95.24 = 695.24 EX : Find mode for the following frequency distribution. Solution: classes Frequency=f 0--------7 7------14 14------21 21------28 28------35 35------42 42------49 49------56 56-----63 3 11 15 20 25 18 13 3 2 Lmo = 28 W = 35-28 = 7 d1 = 25 – 20 = 5 d2 = 25 – 18 = 7 Mo d1 = Lmo + ( = 28 + ( = 28 + ( = 28 + 5 d1+ d2 )7 5+7 5 12 )w )7 35 12 = 28 + 2.92 = 30.92 H.W: Do EX 3-44(pg 104) Symmetrical distributions contain only one mode. These distributions have always same value for mean, median and mode. In skewed distributions, median is often the best measure of central tendency because it always between mean and mode 2-Dispersion: Sometime central tendency does not explain the data. So we need more informations.This is done by measuring dispersion. Dispersion is the spread of data. It means the way to which the values are spread out about their centre. The quantity that measures this quality is called dispersion. If values are close to the centre, we say the dispersion is small otherwise it is large dispersion. There are two types of measure of dispersion. a) Absolute measure of dispersion. b) Relative measure of dispersion. Absolute measure of dispersion Relative measure of dispersion The main absolute measure of dispersion is, i) The Range. ii) The Interqurtile Range. iii) The Variance and Standard Deviation. The relative measure of dispersion is , iv) The coefficient of variation. We can calculate the range as, Range = highest value in data - lowest value in data. The range does not measure spread of most of the values in data set. It only measure the spread between the highest value and lowest value. The range does not give any information about nature of data. EX : Find the range for the following data. 863 1698 940 1883 1041 1354 903 1802 957 Solution: 863 1354 903 1698 940 1802 957 1883 1041 1204 1138 1138 1204 Highest value = 1883 Lowest value = 863 Range = highest value – lowest value = 1883 – 863 = 1020 The quartile divides the data in to four equal parts. We arrange the data in to an array.Q1 is lower or first quartile.Q2 is middle or second quartile.Q3 is upper or third quartile. We can calculate the interquartile range as, Interquartile range = Q3 -- Q1 Here 𝑛 Q1 = th value 4 Ex : (EX 3-52 pg 111) Find interquartile range for the given data, 99 75 84 61 33 45 66 72 91 74 93 54 76 52 97 91 69 77 55 68 Solution: 33 74 68 93 69 97 72 99 45 75 52 76 54 77 55 84 61 91 66 91 Q1 = 55 Q3 = 84 Interquartile range = Q3 – Q1 = 84 - 55 = 29 EX : (EX 3-56 pg 111) Calculate the range and interquartile range for the given data, 0.10 0.23 0.45 0.77 0.50 0.12 0.32 0.66 0.53 1.10 0.67 0.83 0.58 0.69 0.48 0.51 0.32 0.45 0.48 0.50 0.69 0.77 0.83 0.89 = 1.20 = 0.10 = highest value - lowest value = 1.20 - 0.10 = 1.1 Q1 = 0.45 Q3 = 0.77 Interquartile range = Q3 - Q1 = 0.77 – 0.45 = 0.32 0.51 0.95 0.53 1.10 0.58 1.20 Solution: 0.10 0.12 0.2 0.59 0.66 0.67 a) Highest value Lowest value Range H.W : Do EX 3-58 (pg 112) 0.89 0.59 1.20 0.95 The mean of the squares of deviations of all the values from their mean is called variance. Standard deviation is the square root of the variance. It provides an average distance for each value from the mean. Standard deviation is zero if all the values in the data are same. As it is based on all the values in the data so it is very important measure of dispersion. We can calculate the population variance and sample variance as, Standard deviation is only used to measure spread or dispersion around the mean of a data set. Standard deviation is never negative. Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation and in return, distort the picture of spread. For data with approximately the same mean, the greater the spread, the greater the standard deviation. If all values of data set are same, the standard deviation is zero. The variance, standard deviation for ungrouped data: Population variance = 𝜎 2 = = S2 = Sample variance ∑(𝑥− 𝜇) 𝑁 ∑ (𝑥− x )2 𝑛−1 We can calculate population standard deviation and sample standard deviation as, Population standard deviation = Sample standard deviation = 𝜎 S = = √ ∑(𝑥− 𝜇) √∑ 𝑁 (𝑥− x )2 𝑛−1 It is relative measure of dispersion. It can be calculated as, 𝜎 Population coefficient of variation = Sample coefficient of variation x 100 𝜇 𝑠 = x x 100 EX : A man took a sample of 5 batteries from a day’s production and used them until they were drained. The number of hours they were used until failure were given below. Compute the variance, standard deviation and coefficient of variation. 342 426 317 545 630 Solution: 𝑥 (𝑥 − x )2 (𝑥 − x ) 342 426 317 545 630 -110 -26 -135 93 178 12100 676 18225 8649 31684 2260 n = 5 ∑ 𝑥 = 2260 ∑ (𝑥 − x )2 = 71334 x = S2 = 71334 ∑𝑥 = 𝑛 ∑ (𝑥− x )2 𝑛−1 452 71334 = S √∑ = = CV = 17833.5 4 = = (𝑥− x )2 𝑛−1 √17833.5 𝑆 x =133.542 x 100 133.542 452 x 100 = 29.54% EX : Calculate the variance, standard deviation and coefficient of variation from the following marks obtained by 9 students. 45 32 37 46 39 36 41 48 36 Solution: 𝑥 45 32 37 46 39 36 41 48 36 (𝑥 − x ) 5 -8 -3 6 -1 -4 1 8 -4 360 n = 9 ∑ 𝑥 = 360 ∑ (𝑥 − x )2 = 232 (𝑥 − x )2 25 64 9 36 1 16 1 64 16 232 x S2 = = S ∑𝑥 = = 𝑛 ∑ (𝑥− = = 40 x )2 𝑛−1 232 = 29 8 √∑ (𝑥− x )2 𝑛−1 = √29 CV = 𝑠 x 5.385 40 = 5.385 x 100 x 100 = 13.46% The variance, standard deviation for grouped data: EX : A frequency distribution on the length of telephone calles monitored at the switchboard of an office is given below, Calculate the variance and standard deviation of the calling time. Classes 0-----2 2-----4 4-----6 6-----8 8----10 Frequency=f 5 10 40 30 15 Solution: Classes x 0---------2 2---------4 4---------6 6---------8 8--------10 f 1 3 5 7 9 x 5 30 200 210 135 100 580 (𝑥 − x )2 f(𝑥 − x )2 -4.8 -2.8 -0.8 1.2 3.2 23.04 7.84 0.64 1.44 10.24 115.2 78.4 25.6 43.2 153.6 416 ∑𝑓 = = = S 5 10 40 30 15 (𝑥 − x ) ∑ 𝑓𝑥 = S2 fx = = 580 = 5.8 100 ∑ f(𝑥− x )2 ∑𝑓 416 100 = 4.16 √∑ f(𝑥− x )2 ∑𝑓 √4.16 = 2.03 EX :(3-66,pg-123) Calculate the variance and standard deviation of the given data. Classes Frequency=f 1------3 4-----6 7-----9 10----12 13-----15 16-----18 19-----21 22-----24 18 90 44 21 9 9 4 5 Solution: Classes 1----------3 4-----------6 7-----------9 10---------12 13---------15 16---------18 19---------21 22---------24 x x f fx (𝑥 − x ) (𝑥 − x )2 f(𝑥 − x )2 2 5 8 11 14 17 20 23 18 90 44 21 9 9 4 5 36 450 352 231 126 153 80 115 -6 -3 0 3 6 9 12 15 36 9 0 9 36 81 144 225 648 810 0 189 324 729 576 1125 200 1543 = ∑ 𝑓𝑥 ∑𝑓 4401 = S2 = 1543 ∑ f(𝑥− 4401 = 22.005 200 √∑ f(𝑥− x )2 = = x )2 ∑𝑓 = S = 7.7 = 8 200 ∑𝑓 √22.005 = 4.69 H.W: a- Do EX 3-67(PG-123) b- A hen lays eight eggs. The weight (in grams) of each egg is given below.Find variance, Standard deviation and coefficient of variation. 60 56 61 68 51 53 69 54 3-Kurtosis: It is the measure of degree of peakedness of a distribution. 4-Skewness: A distribution in which the values equidistance from the mean have equal Frequencies are defined to be symmetrical and any departure from symmetry is called skewness. OBJECTIVE SECTION Q-1 Write short answers for the following. 1- Define Summary Statistics. Answer: Single numbers that explain certain qualities of a data set Are called summary statistics. 2- Define Kurtosis. Answer: It is measure of degree of peakedness of a distribution. 3- Write down the main measures of central tendency. Answer: i) ii) iii) iv) The Arithmetic mean. The Weighted mean. The Median. The Mode. 4- Which measure of dispersion is defined as, It is the mean of the squares of deviations of all the values from their mean. Answer: Variance. Q-2 Choose the correct one. 1- There are ----------main qualities of data that give useful information about data, i) Seven ii) Ten iii) Four 2- We can calculate sample arithmetic mean as, ∑𝑥 i) ii) 𝑥-n iii) None of these. 3- Median divides the data in to -------- equal halves. i) Five ii) Nine iii) Two Q-3 Write true or false for the following. 1- Standard deviation is the positive square root of variance. 2- Range is the absolute measure of dispersion. 3- Mean, median and mode has same value for skewed distribution.