Measure of Dispersion or Variability Lecture 4 Dr. Maher H.S. Alasady The mean, mode and median do a nice job in telling where the center of the data set is, but often we are interested in more. For example, a pharmaceutical engineer develops a new drug that regulator iron in the blood. Suppose he finds out that the average sugar content after taking the medication is optimal level. This does not mean that the drug is effective. There is a possibility that half of the patients have dangerously low sugar content while the other half has dangerously high content. Instead of the drug being an effective regulator, it is a deadly poison. What the pharmacist needs is a measure of how far the data is spread apart. This is what the variance and standard deviation do. The most common measures of dispersion or variability are (Range, Variance, Standard deviation and Coefficient of variation). Range (R): The range is the difference between the largest (XL) and the smallest (XS) values in a set of observations. R = X L - XS Note: The range is poor measure of dispersion? Because it only takes into account two of the values. Variance: The variance is the most commonly used to measure of spread in biological statistics. For a population is defined as the sum of squares of the deviation from the mean (SS), dividing by the total number of the deviations, and by one less than the total number of the deviation (degree of freedom, df) for a sample. N 2 ( X ) i 1 N 2 or 2 N Xi N 1 2 Xi 2 i 1 ……..……….. (Population) N i 1 N 17 n S 2 __ ( Xi X ) i 1 n 1 2 or 2 n Xi n 1 Xi 2 i 1 S2 ……..……..….. (Sample) n 1 i 1 n Note: The reason for dividing by (n-1), (df), in sample, because the sum of deviations of the values from their mean is equal zero. n __ ( Xi X ) 0 i 1 Why the separate formula for sample? The formula for sample divides by n-1 to: 1. Correct for probability that most extreme cases will be excluded from a smaller sample. 2. Makes the sample more representative of the population for every small sample. 3. Reduces the denominator to a larger extent (If n=5 they we have a 20% reduction in the denominator). But in large samples the n-1 correction does not have as large effect. Standard Deviation (s) or (sd): Is defined as a positive square root of variance. 2 ……………. (Population) s s 2 ……………… (Sample) The mean squared difference from the sample mean will, on average, underestimate the population variance. In some samples it will overestimate it, but most of the time it will underestimate it, if the formula is modified so that the sum of squared deviations is divided by n-1 rather than N, then the tendency to underestimate the population variance is eliminated. 18 Coefficient of Variation (C.V): Is defined as the ratio of the standard deviation to the mean. It is independent of the units employed. C.V C .V s __ or C.V 100% or C.V s X __ ……………. (Population) 100% ……………… (Sample) X Note: For biological data coefficient of variation (C.V) of the order of 10% to 15% are common. For homogenous material this figure my be reduced to 5% while a coefficient of variation of 25% would indicate very considerable variability. Example: The data below present the technician from two different laboratories, all making the same specific blood chemistry determination using a solution with a know concentration (5 mg/ml). Laboratories 1 2 C.V Technician 5, 8, 6 6, 4, 9 Mean 6.333 6.333 1.528 100% 24% 6.333 C.V Standard deviation 1.528 2.517 2.517 100% 39.7% 6.333 Lab. 1 gives most accurate result. Example: A set of data (4, 6, 3, 4, 5 and 2) compute: The range, the variance, the standard deviation and the coefficient of variation? Solution: R = XL - XS ................................ R= 6-2=4 19 2 n Xi 1 n 2 2 S Xi i 1 n 1 i 1 n 1 2 (4 6 3 4 5 2)2 2 2 2 2 2 S (4 6 3 4 5 2 ) 2 6 1 6 2 s s 2 …................................. s 2 = 1.414 n __ X Xi i 1 N C .V s __ __ …................................ X 4 63 45 2 4 6 …................................ C.V X 1.414 100% 35.35% 4 Example: The following shows the hemoglobin values (g/100ml) of 30 children receiving treatment for hemolytic anemia, compute: The variance, the standard deviation and the coefficient of variation? Hemoglobin 6.5 – 7.5 7.5 – 8.5 8.5 – 9.5 9.5 – 10.5 10.5 – 11.5 11.5 – 12.5 Midpoint Frequency (mi) (fi) 7 8 9 10 11 12 20 1 5 11 9 3 1 fi =30 Solution: 2 k mifi k 1 2 2 mi fi i 1 k Variance for grouped data ………… S k fi 1 i 1 fi i 1 i 1 1 2 (7 *1 8 * 5 ... 12 *1) 2 2 2 S (7 *1 8 * 5 ... 12 *1) 1.199 30 1 30 2 s s 2 …....................................... s 1.199 =1.095 k __ X mifi i 1 k fi ……………………….. (7 *1 8 * 5 9 *11 10 * 9 11* 3 12 *1) 9.367 30 i 1 C.V 1.095 100% 11.69% 9.367 Home Work 3: Q1: The following table gives the results of a survey to study the ages and hemoglobin levels of patients of a certain clinic. Age (Years) Hemoglobin level (g/dl) Mean 30 60 Standard Deviation 6 10 Determine whether hemoglobin levels of the patients are more variable than ages? 21 Q2: The weights (in kg) of 8 pregnant women gave the following results: 8 x i 1 i 8 x 495 i 1 Find: (a) The mean. 2 i 30659 (b) The variance. (c) The standard deviation. (d) The coefficient of variation. Q3: The following are the glucose levels (g/100ml) of a sample of 50 children. 126 114 115 115 110 117 115 88 116 119 101 113 113 109 115 100 112 90 108 83 116 113 89 122 109 111 132 106 123 117 120 130 104 149 118 138 128 126 140 110 118 122 127 121 108 108 121 111 137 134 Prepare: 1. Grouped data by frequency distribution. 2. Find the mean, median and mode. 3. Computed the R, S2, S and C.V.? 22