Measure of Dispersion or Variability Lecture 4 The mean, mode and median do a nice job in telling where the center of the data set is, but often we are interested in more. For example, a pharmaceutical engineer develops a new drug that regulates iron in the blood. Suppose he finds out that the average sugar content after taking the medication is the optimal level. This does not mean that the drug is effective. There is a possibility that half of the patients have dangerously low sugar content while the other half has dangerously high content. Instead of the drug being an effective regulator, it is a deadly poison. What the pharmacist needs is a measure of how far the data spread apart. This is what the variance and standard deviation do. The most common measures of dispersion or variability are (Range, Variance, Standard deviation and Coefficient of variation). Range (R): The range is the difference between the largest (XL) and the smallest (XS) values in a set of observations. R = XL - XS Note: The range is poor measure of dispersion? Because it only takes into account two of the values. Variance: The variance is the most commonly used to measure of spread in biological statistics. For a population is defined as the sum of squares of the deviation from the mean (SS), dividing by the total number of the deviations, and by one less than the total number of the deviation (degree of freedom, df) for a sample. N 2 ( X ) i 1 N 2 or 2 N Xi 1N 2 2 Xi i 1 ……..……….. (Population) N i 1 N 19 n S 2 __ ( Xi X ) 2 i 1 or n 1 2 n Xi n 1 Xi 2 i 1 S2 ……..……..….. (Sample) n 1 i 1 n Notes: 1. The variance is a measure that uses the mean as a point of reference. 2. The variance is small when all values are close to the mean. 3. The variance is large when all values are spread away from the mean. Why the separate formula for sample? The formula for sample divided by n-1 to: 1. Correct for probability that most extreme cases will be excluded from a smaller sample. 2. Makes the sample more representative of the population for every small sample. 3. Reduces the denominator to a larger extent (If n=5 they we have a 20% reduction in the denominator). But in large samples the n-1 correction does not have as large effect. Example: We want to compute the sample variance of the following sample values: 10, 21, 33, 53, 54. Solution: n=5 * First method: 5 _ x xi i 1 5 10 21 33 53 54 171 34.2 5 5 2 _ xi x i 1 2 S n 1 n 2 5 xi 34.2 i 1 5 1 20 S 2 2 2 2 2 2 10 34.2 21 34.2 33 34.2 53 34.2 54 34.2 4 1506.8 376.7 4 * Second method: xi xi 34.2 xi 34.22 10 21 33 53 54 -24.2 -13.2 -1.2 18.8 19.8 585.64 174.24 1.44 353.44 392.04 xi 171 _ xi x 0 i 1 _ xi x 1506.8 i 1 5 5 i 1 2 5 _ x 171 34.2 5 1506.8 4 376.7 S2 * Third method: xi 10 21 33 53 54 xi2 100 441 1089 2809 2916 xi 171 xi 7355 2 7355 534.2 1506.8 S 376.7 5 1 4 2 2 Standard Deviation (s) or (sd): Is defined as a positive square root of variance. 2 ……………. (Population) s s 2 ……………… (Sample) The mean squared difference from the sample mean will, on average, underestimate the population variance. In some samples, it will overestimate it, but most of the time it will underestimate it, if the formula is modified so that the sum of squared deviations is divided by n-1 rather than N, then the tendency to underestimate the population variance is eliminated. Coefficient of Variation (C.V): The variance and the standard deviation are useful as measures of variation of the values of a single variable for a single population or sample, but if we want to 21 compare the variation of two variables or two samples we can not use the variance and the standard deviation because: 1. The variables or samples might have different units. 2. The variables or samples might have same means. Coefficient of Variation is defined as the ratio of the standard deviation to the mean. It is independent of the units employed (unit less). C.V C.V 100% ……………. (Population) s __ 100% ….…………… (Sample) X Notes: In biological experiments if coefficient of variation (C.V): 1) Is 10% to 15% the variance between a data is acceptable. 2) Is 5% or less, that is referring to homogenous data with less variance. 3) Is 25% or more indicating very considerable variance. 1. C.V for Ungrouped Data Example: The data below present the technician from two different laboratories, all making the same specific blood chemistry determination using a solution with a know concentration (5 mg/ml). Laboratories 1 2 C.V Technician 5, 7, 6, 6 6, 4, 9, 5 0.82 100 13.6% 6 Mean 6 6 C.V Standard deviation 0.82 2.16 2.16 100 36% 6 Lab. 1 gives the most accurate result. Example: A set of data (4, 6, 3, 4, 5 and 2) compute: The range, the variance, the standard deviation and the coefficient of variation? Solution: R = XL - XS ................................ R= 6-2=4 22 2 n Xi 1 n 2 2 S Xi i 1 n 1 i 1 n 1 2 (4 6 3 4 5 2)2 2 2 2 2 2 S (4 6 3 4 5 2 ) 2 6 1 6 2 s s 2 …................................. s 2 = 1.414 n __ X Xi i 1 C.V N s __ __ …................................ X 4 63 45 2 4 6 100 …................................ C.V X 1.414 100 35.35% 4 2. C.V for Grouped Data Example: The following shows the hemoglobin values (g/100ml) of 30 children receiving treatment for hemolytic anemia, compute: The variance, the standard deviation and the coefficient of variation? Hemoglobin Midpoint (mi) 6.5 – 7.5 7.5 – 8.5 8.5 – 9.5 9.5 – 10.5 10.5 – 11.5 11.5 – 12.5 7 8 9 10 11 12 Frequency (fi) 1 5 11 9 3 1 fi =30 Solution: 2 k mifi k 1 2 2 mi fi i 1 Variance for grouped data ………… S k k fi 1 i 1 fi i 1 i 1 23 S2 1 2 (7 1 8 5 ... 12 1) 2 2 2 ( 7 1 8 5 ... 12 1 ) 1.199 30 1 30 s s 2 …....................................................... s 1.199 1.095 k __ X mifi i 1 k fi …..………… (7 1 8 5 9 11 10 9 11 3 12 1) 9.367 30 i 1 C.V 1.095 100 11.69% 9.367 Home Work 3: Q1: The following table gives the results of a survey to study the ages and hemoglobin levels of patients of a certain clinic. Age (Years) Hemoglobin level (g/dl) Mean 30 60 24 Standard Deviation 6 10 Determine whether hemoglobin levels of the patients are more variable than ages? Q2: The weights (in kg) of 8 pregnant women gave the following results: 8 x i 1 i 8 x 495 i 1 Find: (a) The mean. 2 i 30659 (b) The variance. (c) The standard deviation. (d) The coefficient of variation. Q3: The following are the glucose levels (g/100ml) of a sample of 50 children. 126 117 101 100 116 111 120 138 118 108 114 115 113 112 113 132 130 128 122 121 115 88 113 90 89 106 104 126 127 111 Prepare: 1. Grouped data by frequency distribution? 2. Find the mean, median and mode? 3. Compute the R, S2, S and C.V.? 25 115 116 109 108 122 123 149 140 121 137 110 119 115 83 109 117 118 110 108 134