Measures of Spread The measures of spread or dispersion of a data set are quantities that indicate how closely a set of data clusters around its centre. A) Deviation A deviation is the difference between an individual value in a set of data and the mean for the data. For a population, For a sample, deviation = x - μ deviation = x - Note: larger the size of the deviation, the greater the spread in the data values less than the mean have negative deviations B) Standard Deviation The standard deviation is the square root of the mean of the squares of the deviations σ - symbol for standard deviation s - standard deviation of a sample Population Standard Deviation σ= x N Sample Standard Deviation X x 2 2 s= n 1 where N is the number of data in the population and n is the number in the sample C) Variance The mean of the squares of the deviations is another useful measure. This quantity is called the variance and is equal to the square of the standard deviation. Population Variance Sample Variance x σ = x x 2 2 2 2 s = N n 1 Example 1: Calculate the mean and standard deviation of Alice=s commuting time in minutes from the following data: 55 68 83 59 68 75 62 78 97 83 Solution: The mean is 72.8 minutes. Standard Deviation: Commuting Time (x) (x - μ) (x - μ)2 55 68 83 59 68 75 62 78 97 83 (x - μ)2= σ= x 2 N Alice=s average commuting time is ____ minutes with a standard deviation of _____ minutes. D) Grouped Data Formulas f i mi N f i mi x n 1 2 σ s 2 where fi is the frequency for a given interval and m i is the midpoint of the interval Example 2: Determine the mean and standard deviation for the following. Average daily interest ($) accumulated in savings accounts Interest 0.50-10.50 10.50-20.50 20.50-30.50 30.50-40.50 Frequency 32 11 5 2 Solution: Mean = Standard deviation = E) Quartiles and Interquartile Ranges Quartiles divide a set of ordered data into four groups with equal numbers of values, just as the median divides data into two equally sized groups. The three Adividing points@ are the first quartile (Q1) the median (sometimes called the second quartile or Q2) and the third quartile (Q3). Q1 and Q3 are the medians of the lower and upper halves of the data. The interquartile range is Q3 - Q1 which is the range of the middle half of the data. The larger of the interquartile range, provides a measure of spread. The semi-interquartile range is one half of the interquartile range. Both of these ranges indicate how closely the data are clustered around the median. Example 3: The following data represent 20 people=s estimates of the size of a crowd at a public gathering. 650 400 1000 550 500 625 600 575 700 750 500 900 600 650 700 700 450 575 750 800 Determine the median and the interquartile range. Solution: The number of data is even so the median will be the means of adjacent data. In fact, because the number of data is a multiple of four, the quartiles will be the means of adjacent data. Order the data from smallest to largest. Order Data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Median 400 450 500 500 550 575 575 600 600 625 650 650 700 700 700 750 750 800 900 1000 = (average of 10th and 11th data) = First quartile, Q1 = (average of 5th and 6th data) Third quartile, Q3 = (average of 15th and 16th data) = Interquartile range = Q3 - Q1 Semi-interquartile range = = = The median of the estimates of the crowd is 637.5 with half of the estimates within 81.25 of this. F) Percentiles Percentiles are similar to quartiles except that percentiles divide the data into 1intervals that have equal numbers of values. Thus, k percent of the data are less than or equal to kth percentile, Pk, and (100-k) percent are greater than or equal to Pk. Example 4: On a recent aptitude test, Carrie was rated in the 93rd percentile. If 1068 people wrote the test, how many people had a lower score on the test than Carrie did? Solution: (0.93)(1068) = 993.24 There were 993 people who had a lower score than Carrie. Example 5: In a popular mathematics competition, only the contestants in the top 5 percentiles win Diplomas of Distinction. If there were 478 contestants, how many Diplomas of Distinction would be awarded? Solution: (0.05)(478)=23.9 There were 24 Diplomas of Distinction awarded. G) Z-scores A z-score is the number of standard deviations that a datum is from the mean. Calculate by dividing the deviation for a datum by the standard deviation. Population z= x For a Sample z= xx s Note: Variable values below the mean have negative z-scores, values above the mean have positive z-scores, and values equal to the mean have a zero z-score. Example 6: Find the mean and standard deviation of the z-scores of the following set of data. Solution: Tabulate to compute the mean and standard deviation. Mean is = 15 Standard deviation is s = 3.74 The z-scores and their distribution can now be determined. x z x x z2 s 10 -1.34 1.79 11 -1.07 1.14 14 -0.27 0.07 16 0.27 0.07 19 1.07 1.14 20 1.34 1.79 z = 0 The mean of the z-scores is z2=6.0 z n =0 The standard deviation of the z-scores is 1. Follow Up: page 148-149 #1-7, 9, 10