Last Update 16th March 2011 SESSION 17 & 18 Measures of Dispersion Measures of Variability Lecturer: University: Domain: Florian Boehlandt University of Stellenbosch Business School http://www.hedge-fundanalysis.net/pages/vega.php Grouped Data – Investment B Intervals -25 to < -15 -15 to < -5 -5 to < 5 5 to < 15 15 to < 25 25 to < 35 Total Total / 2 Mean Ome f(<) fme Median Omo fm fm-1 fm+1 Mode x f -20 -10 0 10 20 30 f(<) 2 5 5 4 6 3 25 12.5 xf 2 7 12 16 22 25 Actual -40 -50 0 40 120 90 160 6.400 5 12 4 6.250 15 6 4 3 19.000 7.072 4.700 multimodal Learning Objectives 1. Measures of relative standing: Median, Quartiles, Deciles and Percentiles 2. Measures of dispersion: Range 3. Measures of variability: Variance and Standard Deviation Percentiles The Pth percentile is the value for which P percent are less than that value and (100 – p)% are greater than that value. Some special percentiles commonly used include the median and the quartiles. Percentiles are measures of relative standing. Terminology 50th Percentile 25th, 50th, 75th,100th Percentile 20th, 40th,…, 100th Percentile 10th, 20th,…, 100th Percentile ½ 1 Median ¼4 Quartiles 1/5 5 Quintiles 1/10 10 Deciles Q2 Q1 , Q2 , Q3,Q4, Lp Location of a Percentile The location L of a percentile is a function of the required percentile P and the sample size n: Lp = (n + 1) * (P / 100) As with the median, all observations must be placed in ascending or descending order first. Calculation of Percentile 1. Place all observations in order 2. Calculate the location of the percentile 3. Since the location will often be a fraction (e.g. n/2), the distance between the two observations in question must be multiplied with the fractional part of the location 4. The result of 3. is added to the preceding observation to yield the percentile Percentile: An example The following denotes the number of hours spent on the internet: 0 0 5 7 8 9 12 14 22 23 The values are already placed in order. The sample size is n = 10. We wish to determine L25, L50 and L75 (this is analogous to the quartiles Q1, Q2 and Q3) Solution – Step 1 Obs 1 2 3 4 5 6 7 8 9 10 Data 0 0 5 7 8 9 12 14 22 23 Quartile 25 50 75 n 10 Lp 2.75 5.50 8.25 =(10 + 1) * (25 / 100) =( + 1) * (50 / 100) =( + 1) * (75 / 100) Use the formula to calculate the location for each percentile / quartile Solution – Step 2 Obs 1 2 3 4 5 6 7 8 9 10 Data 0 0 5 7 8 9 12 14 22 23 Quartile 25 50 75 n 10 Lp Fraction 2.75 0.75 5.50 0.50 8.25 0.25 =2.75 - 2 =5.5 - 5 =8.25 - 8 Determine the fractional part of the location Solution – Step 3 Obs 1 2 3 4 5 6 7 8 9 10 Data 0 0 5 7 8 9 12 14 22 23 Quartile 25 50 75 n 10 Lp Fraction Lower Upper 2.75 0.75 0 5 5.5 0.50 8 9 8.25 0.25 14 22 Determine the next lower and next higher observation associated with the location. For 2.75, the two observations are 2 0 and 3 5. Solution – Step 4 Obs 1 2 3 4 5 6 7 8 9 10 Data 0 0 5 7 8 9 12 14 22 23 Quartile 25 50 75 n 10 Lp Fraction Lower Upper Solution 2.75 0.75 0 5 3.75 5.5 0.50 8 9 8.50 8.25 0.25 14 22 16.00 =0 + (5 - 0) * 0.75 =8 + (9 - 8) * 0.5 =14 + (22 - 14) * 0.25 In order to determine the quartile associated with a given location, you need to calculate the following: Solution = Lower + (Upper – Lower) * Fraction Exercises You may use shortcuts if you want! 1. Determine the first, second and third quartiles: 5 8 2 9 5 3 7 4 2 7 4 10 4 3 5 2. Determine the third and eighth deciles (30th and 80th percentile): 10.5 14.7 15.3 17.7 15.9 12.2 10.0 14.1 13.9 18.5 13.9 15.1 15.7 Range The range is the difference between the minimum and maximum observation. It is a measure of dispersion. The interquartile range is the difference between the third and the first quartile: Interquartile Range = Q3 – Q1 Variance The variance expresses the sum of the squared deviation of every single observation from the sample / population mean. All differences are squared so that positive and negative deviations from the mean are not cancelled out. The variance in a measure of variability. Population and Sample Variance We need to differentiate between population variance and sample variance. From the calculation of the mean, the sample variance has one less degrees of freedom (n-1) in calculating the variance. For the hypothetically infinite population of size N this is not the case. Formulas Sample Sample size Observation Sample Mean Sample Statistic Population Total population size Observation Population Mean Population Parameter Calculation of Variance 1. Calculate the average: Sum of observations / number of observations 2. Subtract the average from every obervation 3. Square the difference 4. Sum the squared differences 5. Divide the result from 4. by either N (population) or n-1 (sample) Variance: An example The following denotes the number of hours spent on the internet for a sample of n = 10 adults: 0 7 12 5 33 14 8 0 9 22 Calculate the variance. Solution – Step 1 Obs Data Difference 1 0 -8 2 7 -1 3 12 4 4 5 -3 5 3 -5 6 14 6 7 8 0 8 0 -8 9 9 1 10 22 14 Total 80 n 10 n-1 Average 8 =(0 - 8) =(7 - 8) =(12 - 8) =(5 - 8) =(3 - 8) =(14 - 8) =(8 - 8) =(0 - 8) =(9 - 8) =(22 - 8) Use the mean to calculate the differences between the mean and every observation Solution – Step 2 Obs Data Difference Sqr Diff 1 0 -8 64 2 7 -1 1 3 12 4 16 4 5 -3 9 5 3 -5 25 6 14 6 36 7 8 0 0 8 0 -8 64 9 9 1 1 10 22 14 196 Total 80 412 n 10 n-1 9 Average 8 45.778 =(-8)^2 =(-1)^2 =(4)^2 =(-3)^2 =(-5)^2 =(6)^2 =(0)^2 =(-8)^2 =(1)^2 =(14)^2 Square all differences. Next, Sum the differences and divide the sum by n – 1 (sample only) In case of the sample, the sumsq is divided by n-1, in the case of the population it is divided by N Interpretation Variance The variance may be difficult to interpret. Remember that all differences are squared to avoid positive and negative differences from cancelling out. The statistic may be standardized by taking the square root of the variance. This statistic is called the standard deviation. However, the variances from two datasets may still be referred to when determining the more volatile dataset. Example – Standard Deviation The population standard deviation: Similarly, the sample standard deviation: Thus, for the internet usage example: Solution – Step 3 Obs Data Difference Sqr Diff 1 0 -8 64 2 7 -1 1 3 12 4 16 4 5 -3 9 5 3 -5 25 6 14 6 36 7 8 0 0 8 0 -8 64 9 9 1 1 10 22 14 196 Total 80 412 n 10 n-1 9 Average 8 45.778 Sqrt 6.766 Interpretation: On average, observations of internet usage within the sample of ten people deviates by 6.766 h from the sample mean. Exercises 1. Calculate the variance and standard deviation for the following data: 2 8 9 4 1 7 5 4 2. Calculate the variance and standard deviation for the following data: 7 -5 -3 8 4 -4 1 -5 9 3