WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam WFM 5201: Data Management and Statistical Analysis Lecture-3: Descriptive Statistics [Measures of Dispersion] Akm Saiful Islam Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET) April, 2008 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Descriptive Statistics Measures of Central Tendency Measures of Location Measures of Dispersion Measures of Symmetry Measures of Peakedness WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Measures of Variability or Dispersion The dispersion of a distribution reveals how the observations are spread out or scattered on each side of the center. To measure the dispersion, scatter, or variation of a distribution is as important as to locate the central tendency. If the dispersion is small, it indicates high uniformity of the observations in the distribution. Absence of dispersion in the data indicates perfect uniformity. This situation arises when all observations in the distribution are identical. If this were the case, description of any single observation would suffice. WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Purpose of Measuring Dispersion A measure of dispersion appears to serve two purposes. First, it is one of the most important quantities used to characterize a frequency distribution. Second, it affords a basis of comparison between two or more frequency distributions. The study of dispersion bears its importance from the fact that various distributions may have exactly the same averages, but substantial differences in their variability. WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Measures of Dispersion Range Percentile range Quartile deviation Mean deviation Variance and standard deviation Relative measure of dispersion Coefficient of variation Coefficient of mean deviation Coefficient of range Coefficient of quartile deviation WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Range The simplest and crudest measure of dispersion is the range. This is defined as the difference between the largest and the smallest values in the distribution. If x1 , x 2 ,.........., x nare the values of observations in a sample, then range (R) of the variable X is given by: R x1 , x 2 ,........, x n max x1 , x 2 ,..........., x n min x1 , x 2 ,............, x n WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Percentile Range Difference between 10 to 90 percentile. It is established by excluding the highest and the lowest 10 percent of the items, and is the difference between the largest and the smallest values of the remaining 80 percent of the items. 90 10 P P90 P10 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Quartile Deviation A measure similar to the special range (Q) is the interquartile range . It is the difference between the third quartile (Q3) and the first quartile (Q1). Thus Q Q3 Q1 The inter-quartile range is frequently reduced to the measure of semi-interquartile range, known as the quartile deviation (QD), by dividing it by 2. Thus Q3 Q1 QD 2 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Mean Deviation The mean deviation is an average of absolute deviations of individual observations from the central value of a series. Average deviation about mean k MDx f i xi x i 1 n k = Number of classes xi= Mid point of the i-th class fi= frequency of the i-th class WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Standard Deviation Standard deviation is the positive square root of the mean-square deviations of the observations from their arithmetic mean. Population 2 x i N Sample s 2 x x i SD variance N 1 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Standard Deviation for Group Data SD is : s f i xi x 2 Where N s fx N i i i Simplified formula 2 fx x f fx N 2 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Example-1: Find Standard Deviation of Ungroup Data Family No. 1 2 3 4 5 6 7 8 9 10 Size (xi) 3 3 4 4 5 5 6 6 7 7 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam x x Here, Family No. n i 50 5 10 1 2 3 4 5 6 7 8 9 10 Total xi 3 3 4 4 5 5 6 6 7 7 50 xi x -2 -2 -1 -1 0 0 1 1 2 2 0 4 4 1 1 0 0 1 1 4 4 20 9 9 16 16 25 25 36 36 49 49 270 x i x xi 2 2 s2 2 x x i n 1 20 2.2, 9 s 2.2 1.48 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Example-2: Find Standard Deviation of Group Data x i x x i x 2 f i x i x 2 xi fi f i xi 3 2 6 18 -3 9 18 5 3 15 75 -1 1 3 7 2 14 98 1 1 2 8 2 16 128 2 4 8 9 1 9 81 3 9 9 Total 10 60 400 - - 40 f x x f i i i 60 6 10 s f i xi 2 2 f x i x n 1 2 i 40 4.44 9 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relative Measures of Dispersion To compare the extent of variation of different distributions whether having differing or identical units of measurements, it is necessary to consider some other measures that reduce the absolute deviation in some relative form. These measures are usually expressed in the form of coefficients and are pure numbers, independent of the unit of measurements. WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relative Measures of Dispersion Coefficient of variation Coefficient of mean deviation Coefficient of range Coefficient of quartile deviation WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Coefficient of Variation A coefficient of variation is computed as a ratio of the standard deviation of the distribution to the mean of the same distribution. sx CV x WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Example-3: Comments on Children in a community Mean SD CV Height weight 40 inch 5 inch 0.125 10 kg 2 kg 0.20 Since the coefficient of variation for weight is greater than that of height, we would tend to conclude that weight has more variability than height in the population. WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Coefficient of Mean Deviation The third relative measure is the coefficient of mean deviation. As the mean deviation can be computed from mean, median, mode, or from any arbitrary value, a general formula for computing coefficient of mean deviation may be put as follows: Coefficien t of mean deviation = Mean deviation 100 Mean WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Coefficient of Range The coefficient of range is a relative measure corresponding to range and is obtained by the following formula: LS Coefficien t of range 100 LS where, “L” and “S” are respectively the largest and the smallest observations in the data set. WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Coefficient of Quartile Deviation The coefficient of quartile deviation is computed from the first and the third quartiles using the following formula: Q3 Q1 Coefficien t of quartile deviation 100 Q3 Q1 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Assignment-1 Find the following measurement of dispersion from the data set given in the next page: Range, Percentile range, Quartile Range Quartile deviation, Mean deviation, Standard deviation Coefficient of variation, Coefficient of mean deviation, Coefficient of range, Coefficient of quartile deviation WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Data for Assignment-1 Marks No. of students Cumulative frequencies 40-50 6 6 50-60 11 17 60-70 19 36 70-80 17 53 80-90 13 66 90-100 4 70 Total 70