Math 211 Introduction to Statistics Chapter 4 Measures of Dispersion Dispersion: The degree to which numerical raw data tend to spread about an average value is called the Dispersion, or Variation of the data. The most common measures of dispersion is the range, mean deviation, semi-interquartile range, and standard deviation. The Range: The difference between the largest and smallest numbers in the set. The Mean Deviation: (Average deviation) The mean deviation of a set of numbers X1 , X 2 ,... X N is denoted by MD and defined as N MD X i 1 i X N N Xi X i 1 where X is the arithmetic mean, X i X is the absolute value of the deviations of X i from X . If X 1 , X 2 ,... X k occur with frequencies f1 , f 2 ,... f k respectively, the mean deviation can be written as k MD f i 1 i Xi X N k where N X i . This form is useful for grouped data, where X i ’s represent class marks and i 1 f j ’s are the corresponding class frequencies. The Semi-Interquartile Range: (Quartile Deviation) Q Q1 Q 3 2 The 10-90 Percentile Range: P90 P10 Semi- 10-90 Percentile Range: P90 P10 2 The Standard Deviation: The standard deviation of a set of N numbers X1 , X 2 ,... X N is N denoted by S and defined as S (X i 1 i X )2 . N 1 Sonuç Zorlu Lecture Notes 1 If X 1 , X 2 ,... X k occur with frequencies f1 , f 2 ,... f k respectively, the standard deviation can be written as N S f (X i 1 i i X )2 N 1 k where N X i . This form is useful for grouped data, where X i ’s represent class marks and i 1 f j ’s are the corresponding class frequencies. The Variance: The variance of a set of N numbers X1 , X 2 ,... X N is denoted by S 2 and defined N as S 2 (X i 1 i X )2 N 1 . Properties of the Standard Deviation (1) The standard deviation can be defined as N S (X i 1 i a)2 N 1 a ~ X . S is minimum when a X . (2) For moderately skewed distributions, the percentages below may hold approximately. For normal distributions, 1 s.d. on either side of the mean 2 s.d. on either side of the mean 3 s.d. on either side of the mean Sonuç Zorlu Lecture Notes 2 Short methods for computing the standard deviation N X Xi i 1 i 1 N N N (1) S 2 i 2 X2 X 2 (2) If d j X j A are the deviations of X j from some arbitrary constant A , then 2 N d dj S i 1 i 1 N N (3) (Coding Method) When data are grouped into a frequency distribution whose class intervals have equal size c, we have d j cu j or cu j X j A where u j 0, 1, 2,... , then N 2 j k f u j f ju j i 1 S c i 1 N N k 2 j 2 Example 1. Determine the percentage of the students with grades that fall within their ranges (a) X S (b) X 2S . Given, Grades No.of students 10-19 2 20-29 5 30-39 8 40-49 11 50-59 8 60-69 5 70-79 2 N=41 Xi ui fi ui 14.5 24.5 34.5 44.5 54.5 64.5 74.5 -3 -2 -1 0 1 2 3 -6 -10 -8 0 8 10 6 u i2 9 4 1 0 1 4 9 f i ui2 18 20 8 0 8 20 18 fu c 44.5 0 44.5 Let A=44.5, and c=10. X A N k f u j f ju j i 1 S c i 1 N N k 2 j 2 91 10 0 14.98 15 42 Sonuç Zorlu Lecture Notes 3 (a) X S 44.5 15, 29.5 59.5 The number of students in the range 29.5 59.5 is, 8+11+8=27. 27 66% . The percentage of grades is 41 (b) X 2S 44.5 30, 14.5 74.5 The number of students in the range 29.5 59.5 is, 19 14.5 74.5 70 2 5 8 11 8 5 2 38.8 . 10 10 38.8 95% . The percentage of grades is 41 Example 2. Consider the following frequency distribution to compute X , MD and S , using coding method for X and S . Class boundaries 154.5-158.5 158.5-162.5 162.5-166.5 166.5-170.5 170.5-174.5 174.5-178.5 178.5-182.5 Freq.( f i ) Xi ui 2 3 8 16 12 9 5 156.5 160.5 164.5 168.5 172.5 176.5 180.5 -3 -2 -1 0 1 2 3 .. 55 u i2 fi ui -6 -6 -8 0 12 18 15 ... 25 9 4 1 0 1 4 9 f i ui2 18 12 8 0 12 36 45 ... 131 Xi X -13.8 -9.8 -5.8 -1.8 2.2 6.2 10.2 fi X i X 27.6 29.4 46.4 34.2 26.4 55.8 51 ... 270.8 fu 25 X A c 168.5 4 170.32 N 55 k MD f i 1 i Xi X N 270.8 4.92 55 k f u j f ju j i 1 S c i 1 N N k 2 j 2 2 131 25 4 5.84 55 55 Sonuç Zorlu Lecture Notes 4 Empirical Relation between Measures of Dispersions For moderately skewed distributions, we have the empirical formulae 4 Mean Deviation s tan dard deviation 5 2 Semi Interquartile Range s tan dard deviation . 3 Absolute and Relative Dispersion; coefficient of variation Absolute dispersion is the actual variation. Relative Dispersion absolute dispersion . average If absolute Dispersion S and average X , then coefficient of variation(V )= S (expressed as a percentage) . X Example 3. On a final examination in Statistics, the mean grade of a group of 150 students was 78 and the standard deviation was 8.0. In Calculus, however, the mean grade of the group was 73 and the standard deviation was 7.6. Which subject has the greater (a) absolute dispersion (b) relative dispersion (a) The absolute dispersion of Statistics is Ss 8.0 and of Calculus Sc 7.6 . Therefore, the subject Calculus has smaller absolute dispersion. (b) Coefficients of variation are VS Ss 8.0 10.25% X s 78 VC Sc 7.6 10.41% X c 73 Standardized Variable: Standard Scores The variable that measures the deviation from the mean in units of the standard deviation is XX called a standardized variable and is given by Z . S Sonuç Zorlu Lecture Notes 5 Example 4. A student received a grade of 84 on a final examination in Mathematics for which the mean grade was 76 and the standard deviation was 10. On the final examination in Physics, for which the mean grade was 82 and the standard deviation was16, she received a grade of 90. In which subject was her relative standing higher? X maths 84, X physics 90 , X maths 76, X physics 82 , Smaths 10, S physics 16 Since Z 84 76 90 82 XX 0.8 and Z physics 0.5 , Z maths 10 16 S Therefore the relative standing of the student is higher in Mathematics. Example 5. Find the mean deviation of the numbers 2,2,4,6,7,8,9,12. 2 2 4 6 7 8 9 12 50 4.25 First we need to find X . That is, X 8 8 Then the mean deviation is, N MD X i 1 i N X 2 4.25 2 4.25 4 4.25 6 4.25 7 4.25 8 4.25 9 4.25 12 4.25 8 2.25 2.25 0.25 1.75 2.75 3.75 4.75 7.75 25.5 31.9 8 8 Example 6. Find (a) the Semi-Interquartile Range (b) 10-90 Percentile Range for the data given in Example 2. (a) The semi-interquartile range is Q Q3 Q1 . 2 æN ö çç - (å f ) ÷ ÷ æ13.75 - 13 ö ÷ 1 ç ÷ ÷ çç Q1 = L1 + çç 4 c = 166.5 + ÷ ÷ ÷ ÷.4 = 166.69 çè 16 çç ø ÷ fQ1 ÷ ÷ çè ÷ ø æ3N ö çç - (å f ) ÷ ÷ 1÷ ç ÷ Q3 = L1 + çç 4 c = 174.5 + ÷ ÷ çç ÷ fQ 3 ÷ ÷ çè ÷ ø Q æ41.25 - 41ö ÷ çç ÷ ÷.4 = 174.75 çè ø 9 Q3 Q1 174.75 166.69 4.03 2 2 Hence 50% of the cases lie between 166.69 and 174.75. So the measure of tendency is Q1 Q3 166.69 174.75 170.72 . In other words, 50% of the cases lie in the range 2 2 170.72 4.03. Sonuç Zorlu Lecture Notes 6 (b) The 10-90 Percentile Range P90 P10 æ10 N ö çç - (å f ) ÷ ÷ 1÷ ç ÷ ÷ P10 = L1 + çç 100 c = 162.5 + ÷ çç ÷ f P10 ÷ ÷ ççè ÷ ø æ5.5 - 5 ÷ ö çç ÷ ÷.4 = 162.75 çè 8 ø æ90 N ö çç - (å f ) ÷ ÷ 1÷ ç ÷ ÷ P90 = L1 + çç 100 c = 174.5 + ÷ çç ÷ f P 90 ÷ ÷ ççè ø÷ æ49.5 - 41ö ÷ çç .4 = 178.16 ÷ ÷ çè ø 9 P90 P10 179.16 162.75 17.41 . 1 341.91 1 17.41 170.955 and P90 P10 8.705 P90 P10 2 2 2 2 We conclude that 80% of the cases lie in the range 170.955 8.705. Example 7. Find the standard deviation and the variance of the following set of numbers: 6, 8, 12, 7, 4, 5, 5, 10, 9, 8 N X X i 1 N i 6 8 12 7 4 5 5 10 9 8 74 7.4 10 10 X X X X X 6 8 12 7 4 5 5 10 9 8 -1.4 0.6 4.6 -0.4 -3.4 -2.4 -2.4 2.6 1.6 0.6 1.96 0.36 21.16 0.16 11.56 5.76 5.76 6.76 2.56 0.36 ( X i X )2 56.4 X 74 2 The standard deviation is N S (X i 1 i X )2 N 1 56.4 5.64 2.37 10 and the variance is S 2 5.64 . Sonuç Zorlu Lecture Notes 7 Example 8: Consider the following frequency distribution. classes frequency Xi ui 10-14 15-19 20-24 25-29 30-34 12 17 22 27 32 -2 -1 0 1 2 7 11 14 13 5 Total 50 u i2 4 1 0 1 4 fi ui -14 -11 0 13 10 .. 2 f i ui2 28 11 0 13 20 .. 72 Use the Coding Method to compute X and S . æk ö çç å f u ÷ j j ÷ ÷ ç ÷ ÷ c = 22 + The mean value is X = A + ççç i= 1k ÷ ÷ çç ÷ ÷ çç å f j ÷ ÷ è i= 1 ø æ- 2 ö çç ÷ ÷ ÷5=21.8 çè 50 ø k f u j f ju j i 1 The standard deviation is S c i 1 N N k 2 j Sonuç Zorlu 2 2 72 2 5 5 1.4384 6 . 50 50 Lecture Notes 8