Module 5 Measures of Dispersion Introduction In the previous module, you were taught how to compute the mean, median and the mode both for grouped and ungrouped quantitative variables. These are summarizing figures used to describe a set of observations. In some instances however, these measures may not be sufficient for description especially if the intention is to look at how variable or how different the observations are from each other. To do this, you may use another group of indices, the measures of dispersion or variation. A measure of dispersion indicates the degree of spread or variability of a given set of data. Since the data are not all alike, then an assessment to which data differ from one another is one of the concern of this module. Four types of measures of dispersion will be discussed, namely: range, variance, standard deviation and coefficient of variation. Objectives: After going through this module, you should be able to: 1. Compute and interpret the various measures of dispersion such as the range, variance, standard deviation and coefficient of variation for a set of ungrouped data. 2. Compute and interpret the various measures of dispersion such as the range, variance, standard deviation and coefficient of variation for a set of grouped data. 35 The Range Because the range is simply the difference between the highest and the lowest measurements when the data are ungrouped, or the difference between the true upper limit of the last class and the true lower limit of the first class when the data are grouped, students like you usually consider it the simplest measure of variability. If H represents the highest value and L the lowest value, then the range R for ungrouped data is R = H - L If Hu represents the upper limit of the last class and Ll the lower limit of the first class, then the range R for grouped data is R = Hu - L l The range, as a measure of dispersion, is not difficult to calculate and understand and there is a natural curiosity about the minimum and maximum values. Nonetheless, it is not generally a useful measure of variation – the main shortcoming being that there is no indication concerning the dispersion of the values which fall between two extremes. Thus rendering the range to be a highly unstable measure of variability. SAQ1 a. Compute and interpret the range of heights of the 12 basketball players cited in SAQ2, Module 4. b. Compute an approximation of the range of mathematics scores given in Activity 1, Module 4. 36 ASAQ1 a. The range of the heights of the 12 basketball players cited in SAQ2, Module 4 is 24 . This is the difference between 204 , the highest value, and 180 , the lowest value. b. Since these data are grouped, the range is 69 . This is the difference between the upper limit of the last class (which is equal to 89 cm) and the lower limit of the first class (which is equal to 20 cm). The Variance and Standard Deviation (Ungrouped Data) Let us now tackle the variance. Unfortunately, this is one of the measures which students find conceptually difficult to comprehend. i) Population variance and standard deviation 2 = Xi2 - (Xi)2/N N = population standard deviation = 2 where : 2 (read “sigma square”) = population variance N = population size Xi = the ith observed value for the variable x ii) Sample variance and standard deviation s2 where : = Xi2 - (Xi)2/n n-1 s = sample standard deviation = s2 s2 = sample variance 37 n = sample size Xi = the ith observed value for the variable x s2 s = Xi2 - (Xi)2/n n-1 = 114 – (22)2 / 5 4 = 114 - 484/5 4 = 114 - 96.8 4 = 4.3 = 4.3 = 2.07 SAQ2 a. Once again, refer to item no. 1 of SAQ3, Module 4. Fill the second column of the table below. Number of children (Xi) 3 4 7 6 2 Total: Xi2 b. Compute Xi, (Xi)2, Xi2. c. Compute and interpret the variance and standard deviation for the number of children of 5 families. 38 ASAQ2 a. The notation Xi2 indicates the square of the individual observations. Hence, with the help of a calculator, you should not experience any problem obtaining the results tabulated below. Xi 3 4 7 6 2 Xi2 9 16 49 36 4 Total: b. The answers to this SAQ are (Xi) = 22, (Xi)2 = 484 and Xi2 = 114. Did you notice that (Xi)2 is easily obtained by taking the total of the Xi column, then squaring this total? Do this step by step. Fill up the blanks below. Total of the Xi column =Xi = __________ Square of the total = (Xi)2 = ( )2 = __________ On the other hand, Xi2 is obtained by squaring the individual observations first, then taking the sum of the squared observations. In fact, this is simply the sum of the Xi2 column in the table above. Well, are you now confident that you can discern the difference between (Xi) 2 and Xi2? Good. c. The variance of the number of children is ______ and the standard deviation is _______. How do your results compare with these? If you got them right, you’ve done a good job. If you failed to get the results, please follow through the proper steps in the computation. So how did you interpret the variance of 4.3 ? It should have been “The average of the squared deviations from the mean number of children” is 4.3 . How about the standard deviation of 2.07 . Theoretically, it can be interpreted as the square root of the average of the squared deviations from the mean number of children. In layman’s 39 concept, the greater the value of the standard deviation, the more the observations scatter from the mean. The standard deviation is the most important measure of dispersion. It is affected by the value of each observation. It is the most stable and is, therefore, the most reliable measure of variability. The Variance and Standard Deviation (Grouped Data) Let us now go to the computation of the variance and standard deviation for grouped data. Consider the following frequency distribution table given in the next page: By this time, I trust that you have developed enough facility with statistical notations. I’d like you to study carefully the formula below for the computation of the different measures of variation: Range = Hu – Ll Variance (s2 ) = fiXi2 - [(fiXi)2/n] n-1 Standard deviation (s) = √ s2 where : H Ll = Upper limit of the last class = Lower limit of the first class s2 = variance s = standard deviation Xi = ith observed value for the variable x fi = midpoint or classmark of the ith class n = fi = total frequency or total observations 40 Math Test Scores 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79 Freq (fi) 6 18 23 11 7 5 Midpoint (Xi) 52 57 62 67 72 77 fiXi 312 1026 1426 737 504 385 Xi2 2704 3249 3844 4489 5184 5929 fiXi=4390 n = 70 fiXi2 16224 58482 88412 49379 36288 29645 fiXi2 = 278430 Using the above formulas, we have R = 79 – 50 = 29 S2 = 278430 – (4390)2/70 70 - 1 = (278430 – 275315.71)/69 = 3114.29/69 = 45.1346 s = √s2 = √45.1346 = 6.7182 The Coefficient of Variation The coefficient of variation is a measure of relative dispersion which may be used for comparing the variability of two sets of data. This measure of variation is computed with the use of the formula (Walpole, 1982) CV = s x (100%) 41 where : s = standard deviation x = mean = ∑fiXi/n From the frequency distribution table, we have x = ∑fiXi/n = 4390/70 = 62.7 CV = 6.7182/62.7 (100%) = 10.71% The computed CV of 10.71% indicates that the variability or the degree of differences of the mathematics test scores of the respondent students is relatively low. Another example of the computation of CV and its usefulness is as follows: The mean mathematics achievement test score of one section of first year high school students is 55 with a standard deviation of 15. In another section in the same school, the mean is 30 and the standard deviation is 10. Do the scores of the first group fluctuate about its mean more than those of the second group? (or is the first group more variable than the second group?) Computation: First group: CV1 = s x (100%) = 15/55 (100%) = 27.3% 42 Second group: CV2 = s x (100%) = 10/30 (100%) = 33.3% Since the coefficient of variation is higher for the second group compared to the first group, this implies that the individual scores of the second group is widely dispersed than the individual scores of the first group. 43 Activity 1a. Complete the table below. Scores 20-29 30-39 40-49 50-59 60-69 70-79 80-89 n = fi = No. of Students (fi) 5 9 11 22 13 7 3 Midpoint (Xi) fiXi Xi2 fiXi2 1b. Compute the range, variance, standard deviation and coefficient of variation and interpret the results. 2. Compute the range, variance standard deviation and coefficient of variation of the following teachers’ efficiency grades obtained by 25 faculty members. Do not group the data: 98 97 96 95 94 93 92 91 90 88 88 88 86 85 84 83 82 81 80 79 78 77 76 75 74