Measures of Variability • • • • • Range Interquartile range Variance Standard deviation Coefficient of variation Consider the sample of starting salaries of business grads. We would be interested in knowing if there was a low or high degree of variability or dispersion in starting salaries received. Range •Range is simply the difference between the largest and smallest values in the sample •Range is the simplest measure of variability. •Note that range is highly sensitive to the largest and smallest values. Example: Apartment Rents Seventy studio apartments were randomly sampled in a small college town. The monthly rent prices for these apartments are listed in ascending order on the next slide. Range Range = largest value - smallest value Range = 615 - 425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Interquartile Range The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Variance •The variance is a measure of variability that uses all the data •The variance is based on the difference between each observation (xi) and the mean ( x for the sample and μ for the population). The variance is the average of the squared differences between the observations and the mean value For the population: For the sample: 2 ( x ) i 2 N 2 ( x x ) i s2 n 1 Standard Deviation • The Standard Deviation of a data set is the square root of the variance. • The standard deviation is measured in the same units as the data, making it easy to interpret. Computing a standard deviation For the population: ( xi ) 2 N For the sample: ( xi x ) 2 s n 1 Coefficient of Variation Just divide the standard deviation by the mean and multiply times 100 Computing the coefficient of variation: 100 For the population s 100 x For the sample The heights (in inches) of 25 individuals were recorded and the following statistics were calculated mean = 70range = 20mode = 73variance = 784median = 74 The coefficient of variation equals 10 1. 2. 3. 4. 11.2% 1120% 0.4% 40% 5 0 0% 4 0% 3 0% 2 0 1 0% If index i (which is used to determine the location of the pth percentile) is not an integer, its value should be 10 1. squared 2. divided by (n 1) 3. rounded down 4. rounded up 5 0 0% 4 0% 3 0% 2 0 1 0% Which of the following symbols represents the variance of the population? 1. 2 2. 3. 10 5 0% 0% 3 0 1 0% 2 0 Which of the following symbols represents the size of the sample 1. 2. 3. 4. 2 N n 10 5 0 0% 4 0% 3 0% 2 0 1 0% The symbol s is used to represent 1. 2. 3. 4. 5 the variance of the population the standard deviation of the sample the standard deviation of the population the variance of the sample 10 0 0% 4 0% 3 0% 2 0 1 0% The numerical value of the variance 4. 5 0% 1 0 0 0% 0% 0% 4 3. 10 3 2. is always larger than the numerical value of the standard deviation is always smaller than the numerical value of the standard deviation is negative if the mean is negative can be larger or smaller than the numerical value of the standard deviation 2 1. If the coefficient of variation is 40% and the mean is 70, then the variance is 1. 2. 3. 4. 10 28 2800 1.75 784 5 0 0% 4 0% 3 0% 2 0 1 0% Problem 22, page 94 Broker-Assisted 100 Shares at $50 per Share Range 45.05 Interquartile Range 23.98 Variance Standard Deviation Coefficient of Variation 190.67 13.8 38.02 25th percentile 6 75th percentile 18 interquart 25 24.995 interquart 75 48.975 Mean 36.32 Online 500 Shares at $50 per Share Range 57.50 Interquartile Range 11.475 Variance 140.633 Standard Deviation 11.859 Coefficient of Variation 57.949 25th percentile 75th percentile interquart 25 13.475 interquart 75 24.95 Mean 20.46 The variability of commissions is greater for broker-assisted trades Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Formula Worksheet 1 2 3 4 5 6 7 A B C D E Apart- Monthly ment Rent ($) 1 525 Mean =AVERAGE(B2:B71) 2 440 Median =MEDIAN(B2:B71) 3 450 Mode =MODE(B2:B71) 4 615 Variance =VAR(B2:B71) 5 480 Std. Dev. =STDEV(B2:B71) 6 510 C.V. =E6/E2*100 Note: Rows 8-71 are not shown. Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Value Worksheet 1 2 3 4 5 6 7 A B C D Apart- Monthly ment Rent ($) 1 525 Mean 2 440 Median 3 450 Mode 4 615 Variance 5 480 Std. Dev. 6 510 C.V. Note: Rows 8-71 are not shown. E 490.80 475.00 450.00 2996.16 54.74 11.15 Using Excel’s Descriptive Statistics Tool Step 4 When the Descriptive Statistics dialog box appears: Enter B1:B71 in the Input Range box Select Grouped By Columns Select Labels in First Row Select Output Range Enter D1 in the Output Range box Select Summary Statistics Click OK Using Excel’s Descriptive Statistics Tool • Descriptive Statistics Dialog Box Using Excel’s Descriptive Statistics Tool Value Worksheet (Partial) 1 2 3 4 5 6 7 8 A B C D E Apart- Monthly ment Rent ($) Monthly Rent ($) 1 525 2 440 Mean 490.8 3 450 Standard Error 6.542348114 4 615 Median 475 5 480 Mode 450 6 510 Standard Deviation 54.73721146 7 575 Sample Variance 2996.162319 Note: Rows 9-71 are not shown. Using Excel’s Descriptive Statistics Tool Value Worksheet (Partial) 9 10 11 12 13 14 15 16 A 8 9 10 11 12 13 14 15 B 430 440 450 470 485 515 575 430 C D Kurtosis Skewness Range Minimum Maximum Sum Count Note: Rows 1-8 and 17-71 are not shown. E -0.334093298 0.924330473 190 425 615 34356 70 Measures of Relative Location and Detecting Outliers • z-scores • Chebyshev’s Theorem • Detecting Outliers By using the mean and standard deviation together, we can learn more about the relative location of observations in a data set z-score Here we compare the deviation from the mean of a single observation to the standard deviation The z-score is compute for each xi : xi x zi s Where zi is the z-score for xi x is the sample mean s is the sample standard deviation The z-score can be interpreted as the number of standard deviations xi is from the sample mean Z-scores for the starting salary data Graduate Starting Salary xi - x z-score 1 2850 -90 -0.543 2 2950 10 0.060 3 3050 110 0.664 4 2880 -60 -0.362 5 2755 -185 -1.117 6 2710 -230 -1.388 7 2890 -50 -0.302 8 3130 190 1.147 9 2940 0 0.000 10 3325 385 2.324 11 2920 -20 -0.121 12 2880 -60 -0.362 Chebyshev’s Theorem At least (1-1/z2) of the data values must be within z standard deviations of the mean, where z is greater than 1. This theorem enables us to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean Implications of Chebychev’s Theorem • At least .75, or 75 percent of the data values must be within 2 ( z = 2) standard deviations of the mean. • At least .89, or 89 percent, of the data values must be within 3 (z = 3) standard deviations of the mean. • At least .94, or 94percent, of the data values must be within 4 (z = 4) standard deviations from the mean. Note: z must be greater than one but need not be an integer. Chebyshev’s Theorem For example: Let z = 1.5 with x = 490.80 and s = 54.74 At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56% of the rent values must be between x - z(s) = 490.80 1.5(54.74) = 409 and x + z(s) = 490.80 + 1.5(54.74) = 573 (Actually, 86% of the rent values are between 409 and 573.) Detecting Outliers You can use z-scores to detect extreme values in the data set, or “outliers.” In the case of very high z-scores (absolute values) it is a good idea to recheck the data for accuracy.