251descr2 2/10/06 (Open this document in 'Outline' view!) G. Measures of Dispersion and Asymmetry. 1. Range Range highest number lowest number or highest midpoint lowest midpoint . Interquartile Range: IQR Q3 Q1 . See 251descr2ex2 for example. 2. The Variance and Standard Deviation of Ungrouped Data. a. The Population Variance - Definitional and Computational Formulas. The definition of the population variance is ‘the average squared deviation of measurements from the mean.’ The definitional formula just realizes this definition. Definitional 2 x 2 Computational N 2 x N 2 2 Standard Deviation = variance b. The Sample Variance. Definitional s 2 x x 2 Computational s n 1 2 x 2 nx 2 n 1 The computational formula is one of the most important formulas you will learn. Note that the same as x . For example, if x is 2,3,5 , x 2 2 x 2 is not 2 2 3 2 5 2 4 9 25 38 , not 2 3 52 10 2 100 . Example: Use x 2,3,5 Computational Method x2 x 2 4 3 9 5 25 10 38 From this we find x 10, x 2 38, x Definitional Method x x x 2 -1.33333 3 -0.33333 5 1.66667 10 0.00001 x 10 3.33333 n 3 and x x 2 1.77778 0.11111 2.77778 4.66667 x x 2 4.66667 Note that x x should be zero, but is not because of rounding. Now, if we use the computational method, x nx 38 33.33333 4.6667 2.3333 (Some texts prefer we can use s 2 2 2 2 n 1 s2 x 1 x2 n n 1 2 3 1 2 1 38 10 2 4.66666667 3 2.33333 which gives us a little more accuracy for 3 1 2 a little more work.) If we use the definitional method s 2 that we had to do three subtractions instead of 1. x x n 1 2 4.66667 2.33333 , but note 2 251descr2 2/10/06 c. The Coefficient of Variation. C std .deviation mean d. Chebyshef’s Inequality and the Empirical Rule Chebyshef Inequality: P x k or P k x k 1 1 k 2 1 k 2 . A z-score z x is the same as k . (See explanation below) Empirical rule: (For Symmetrical Unimodal distributions only) 68% within one standard distribution of the mean, 95% within two and almost all (99.7%) within three. 3. The Variance and Standard Deviation of Grouped Data. For grouped data generally substitute f for . 4. Skewness and Kurtosis. Define Population Skewness, the 3rd k-statistic, coefficients of Skewness; Population Kurtosis, the 4th kstatistic, the Coefficient of Excess; Leptokurtic, Platykurtic and Mesokurtic distributions. The usual measurement of skewness is often called the third moment about the mean . (The population variance is the second). The formula for population skewness is: x 3 3 N . The corresponding sample statistic is the third k-statistic, k 3 corresponding computational formulas are 1 n 3 x 3 3 x 2 2 N 3 and k 3 N n 1n 2 data formulas, put an f to the right of the n 1n 2 x n 3 3x x 2 x x 3 . The 2nx 3 . To make grouped sign. Positive values of these formulas imply skewness to the right, negative values to the left. Note that multiplying all the values of x by two would multiply the values of these coefficients by eight, but would not change the shape of the distribution. If we want to compare shapes, we need measurements that will not change if we multiply all values by a constant. Such k a measure would be called the coefficient of relative skewness, with the formulas 1 33 and g1 33 . s Note that for the Normal distribution 1 0 . Other measures of skewness are Pearson's measures of skewness, SK1 mean mod e std .deviation and SK 2 3mean median . These are roughly equivalent, since, for a std .deviation moderately skewed distribution, mean mod e 3mean median . It seems that 3 SK1 3 and that values between. 1 and -1 are considered to indicate moderate skewness. 251descr2 2/10/06 Example: Profit Rate 9-10.99 11-12.99 13-14.99 15-16.99 17-18.99 Total fx x (midpoint) f 3 3 5 3 1 15 10 12 14 16 18 fx2 300 432 980 768 324 2804 30 36 70 48 18 202 fx3 3000 5184 13720 12288 5832 40024 f n 15 , fx 202 , fx 2804 , fx 40024 , so that fx 202 13.467 and s fx nx 2804 1513.467 82.733 5.981 , which means x 2 So 3 2 2 2 2 n 1 15 1 14 s . To measure skewness, use one of the following three s 5.981 2.446 . C 2.446 0182 . x 13.467 results. n 15 k3 fx 3 3x fx 2 2nx 3 40024 313 .467 2804 215 13 .467 3 n 1n 2 14 13 n 15 158.249 = 0.680, or (14 )(13) Relative Skewness g 1 Pearson's Measure of Skewness SK1 k3 s 3 0.680 2.446 3 .046 or mean mod e 13.467 14 0.2179 . Note that, in this case, std .deviation 2.446 Pearson's Measure 1 and Relative Skewness contradict each other as to the direction of skewness. The measures of kurtosis are, for populations, 4 x N 4 1 N x 4 4 n2 n 1 k4 n 1n 2n 3 x 3 x x n 6 2 4 x 2 3 4 3n 13 s 4 . n2 and, for samples, k 4 can be considered an estimate of 4 3 4 . To get a measurement of shape use the Coefficient of Excess 2 4 3 or g 2 k4 . s4 Since the Normal distribution has 4 3 4 , the coefficient of excess is zero for the Normal distribution. Kurtosis has traditionally been considered a measure of the peakedness of a distribution relative to the Normal distribution, though there are some exceptions to this interpretation. If the coefficient of excess is positive, we may call a distribution leptokurtic or sharp-peaked (and long-tailed). If the coefficient of excess is negative, the distribution can be called platykurtic or flat-peaked (and short-tailed). If the coefficient of excess is close to zero, we call the distribution mesokurtic, middle-peaked. A symmetric, mesokurtic distribution is essentially Normal. An alternate measure, called simply the coefficient of .5x.25 x.75 kurtosis is K . This is dimension-free and takes values between zero and 0.5. Values above x.10 x.90 .263 ( K for the Normal distribution) indicate a leptokurtic distribution. Values below .263 indicate a platykurtic distribution. 4 251descr2 2/10/06 Example (using definitional formulas): Profit Rate x f midpoint 9-10.99 3 10 11-12.99 3 12 13-14.99 5 14 15-16.99 3 16 17-18.99 1 18 Total 15 So fx 30 36 70 48 18 202 x x -3.467 -1.467 0.533 2.533 4.533 f x x -10.400 -4.400 2.667 7.600 4.533 0.000 f n 15 , fx 202 , f x x 0 , f x x f x x 8.249 and s2 3 f x x n 1 2 f x x 36.053 6.453 1.422 19.253 20.551 83.732 f x x 3 -124.985 -9.465 0.759 48.775 93.164 8.249 433.323 13.885 1.079 123.457 422.317 944.466 83.732 , f x x 944.466 , so that x 4 f x x 2 fx 202 13.467 and n 15 2 s 83 .732 . . 5.981 , which means s 5.981 2.446 . C 2.446 0182 x 13.467 14 To measure skewness, use one of the following three results. k 3 Relative Skewness g1 0.680, or Pearson's Measure of Skewness SK 3 mean mode n 3 (n 1)(n 2) k3 s f x x 3 0.680 2.446 3 15 8.249 1413 = .046 or 313.467 14 . Note that, in this case, 0163 . std. deviation 2.446 Pearson’s Measure and Relative Skewness contradict each other as to the direction of skewness. f x x 4 3n 13 s 4 n2 n 1 k4 n 1n 2n 3 n n2 k 310337 . 0.868 . The negative sign implies that the distribution is =-31.0337. So g 2 44 s 5.981 2 platykurtic. 5. Review a. Grouped Data. See 251dscr_D b. Ungrouped Data. See 251dscr_D Appendix: Explanation of Sample Formulas (Not for student consumption until you know about expected value.) See 251dscr_B . Appendix: Explanation of Computational Formulas (The part about the variance is fairly easy, the rest is more difficult) See 251dscr_C . 4 251descr2 2/10/06 Appendix: Explanation of Chebyshef’s Inequality Make a diagram. Show a curve that looks like a Normal curve with the middle marked . Mark off two points on either side of on your x axis at equal distances from . Label these points k and k . ( k can be any number above one, like 1.32 or 5. ) The areas below k and above k are the left and right tails of the distribution. Then the statement , P x k of points that is in these two tails cannot be greater than P k x k 1 must exceed 1 1 k2 1 k2 1 k2 1 k2 , means that the total proportion . The statement , , means that the proportion of points that is between k and k . For example, suppose k 1.32, 15 and 3. Then k 15 1.323 11.04 , k 15 1.323 18.96 , 1 proportion of points between 11.04 and 18.96 is above 1 1 k2 k2 1 1 .5739 and the 1.7424 1.32 2 1 .5739 .4261 or 42.61%. The proportion of points in the tails is at most 57.39%. Measures of Inequality Measuring Inequality, a PowerPoint presentation which explains various measures of income inequality, is available on the ECAAR website. This presentation was prepared by Paul Burkholder, ECAAR's Project Manager as part of ECAAR’s project on "Inequality and Democratic Development." Paul is a recent graduate of Temple University, with a degree in economics. He does research for current and potential projects, and assists with media and member outreach. See http://www.ecaar.org/Inequality/powerpoint/measuring%20inequality_files/frame.htm Correction?: In response to a student query (Thank you!), a small correction was made above in the computations for the variance using grouped data and definitional formulas. However, the results fx 2804 1513 .467 2 82 .733 5.981 , so that s 5.981 2.446 bear more n 1 15 1 14 explanation. If you used the numbers given here, you would have gotten fx 2 nx 2 2804 1513 .467 2 83 .599 s2 5.971 and s 5.981 2.446 . My result occurred n 1 15 1 14 because I tend to carry more decimal places than I admit, something that may occur in other places in these notes. Obviously if your answers differ from mine because of this sort of rounding error, I have no business calling them wrong. s 2 2 nx 2