6.3 – Standard Deviation and Z-Scores While interquartile range is an effective measure of spread, it is awkward to calculate and has limited usefulness. A more useful measure of spread, from a mathematical point of view, is the standard deviation. It is a complex measure of spread with some interesting properties. Recall that deviation is the distance a particular piece of data is from the mean. Variance is a measure of dispersion and that is found by averaging the squares of the deviation of each piece of data. Standard deviation is a measure of dispersion that is the square root of the variance. ∑(𝑥 − 𝑥̅ )2 𝜎=√ 𝑛 Where 𝜎 is the standard deviation, 𝑥̅ is the mean, and 𝑛 is the number in the sample. What does this mean? - The standard deviation averages the squares of the distances each piece of data is from the mean. The smaller the standard deviation, the more compact the data set. - So, if most of the data is clustered around the mean, then the standard deviation will be small. - If the data is widely scattered, the standard deviation will be large. Example 1: The following are test scores for two students. Who performed better? Kyle: 73, 56, 92, 67, 88, 34, 77, 65 Janice: 69, 64, 73, 88, 67, 75, 61, 68 First, we calculated the mean of each student Kyle 34 + 56 + 65 + 67 + 73 + 77 + 88 + 92 𝑥̅ = 8 552 𝑥̅ = 8 𝑥̅ = 69 Janice 61 + 64 + 67 + 68 + 69 + 73 + 75 + 88 𝑥̅ = 8 565 𝑥̅ = 8 𝑥̅ ≅ 70.6 Next, we will calculate the standard deviations of each student to see who was more consistent with their marks. Kyle: ∑(𝑥 − 𝑥̅ )2 𝜎=√ 𝑛 (34 − 69)2 + (56 − 69)2 + (65 − 69)2 + (67 − 69)2 + (73 − 69)2 + (77 − 69)2 + (88 − 69)2 + (92 − 69)2 𝜎=√ 8 2384 𝜎=√ 8 𝜎 = 17.26 Mathematics of Data Management (MDM4UC) Page 1 6.3 – Standard Deviation and Z-Scores Janice: ∑(𝑥 − 𝑥̅ )2 𝜎=√ 𝑛 𝜎 (61 − 70.6)2 + (64 − 70.6)2 + (67 − 70.6)2 + (68 − 70.6)2 + (69 − 70.6)2 + (73 − 70.6)2 + (75 − 70.6)2 + (88 − 70.6)2 =√ 8 485.88 𝜎=√ 8 𝜎 = 7.79 Kyle and Janice had averages that were about the same (69% vs 70%). Although Kyle had a larger range of values, the Janice’s lower standard deviation proves that she was more consistent in her grades. Sometimes, as with your major project, many data entries occur more than once. In this case, we use a standard deviation that incorporates frequency into the formula: ∑ 𝑓(𝑥 − 𝑥̅ )2 𝜎=√ 𝑛 Example 2: Candies are packed into bags and sold to people to pass out at Hallowe’en. The number of candies in each bag is close, but not always the same. The following table summarizes the number of candies in a sample of bags. Number of Candies 45 46 47 48 49 Frequency 4 12 15 14 9 Calculate the mean and the standard deviation. ∑ 𝑥𝑤 4(45) + 12(46) + 15(47) + 14(48) + 9(49) 2550 𝑥̅ = = = = 47.2 ∑𝑤 4 + 12 + 15 + 14 + 9 54 𝜎=√ ∑ 𝑓(𝑥 − 𝑥̅ )2 𝑛 𝜎=√ 4(45 − 47.2)2 + 12(46 − 47.2)2 + 15(47 − 47.2)2 + 14(48 − 47.2)2 + 9(49 − 47.2)2 54 75.3 54 𝜎 = 1.18 𝜎=√ Mathematics of Data Management (MDM4UC) Page 2 6.3 – Standard Deviation and Z-Scores Z-Scores A z-score indicates how many standard deviations a data value lies from the mean. 𝑧= 𝑥 − 𝑥̅ 𝜎 Example 3: the final percentages in a class of grade 12 students are as follows: 54 88 64 63 47 64 43 83 69 31 71 77 52 52 15 85 62 72 78 68 73 53 65 The information was entered into a spreadsheet and the following statistical analyses were determined: 𝑛 = 23 𝑥̅ = 62.1 𝜎 = 17 There are two brothers in the class: Bryan and Brayden. If they achieved mark of 88% and 62% respectively, how many standard deviations are each of their marks? Bryan Brayden 𝑥 − 𝑥̅ 𝑥 − 𝑥̅ 𝑧= 𝑧= 𝜎 𝜎 88 − 62.1 54 − 62.1 𝑧= 𝑧= 17 17 𝑧 = 1.52 𝑧 = −0.48 Bryan is 1.52 standard deviations above the Brayden is 0.48 standard deviations below the mean. mean. Practice: (Page 286) #1, 2, 3, 4 Mathematics of Data Management (MDM4UC) Page 3