1.4 Defining Data Spread • An average alone doesn’t always describe a set of data effectively or completely. An average doesn’t indicate whether the data clusters, whether the set contains outliers, what the range is, how the data is spread etc. In general it does not tell about the set’s distribution. The various data distribution plots we have studied help to do that. • Investigate the following to discover a way to determine a single number that can indicate the spread and variation in a data set. • For the following data try to determine the average (mean) distance the values are from the mean of the set. Test Scores 35 44 56 58 62 67 70 72 76 88 90 94 • Step 1- Calculate the mean of the set mean = 67.7 approx. 68 Step 2 – Calculate the distance each value is from the mean ( mean – the value ). This is called the deviation from the mean. Data Value 35 44 56 58 62 67 70 72 76 88 90 94 Deviation(mean – value) 68-35= 33 24 12 10 6 1 -2 -4 -8 -20 -22 -26 • Step 3 – Square each deviation (to remove the negatives) Data Value Deviation Squared Deviation 35 33 1089 44 24 576 56 12 144 58 10 100 62 6 36 67 1 1 70 -2 4 72 -4 16 76 -8 64 88 -20 400 90 -22 484 94 -26 676 • Step 4 - Find the mean of the squared deviations. 3590 / 12 = 299.2 approx. 299 Step 5 – Find the square root of step 4 (the mean of the squared deviations) √ 299 = 17.3 • You just found what is called….. Standard deviation – a # that describes the spread/variation within a set of data. It represents the average distance the data values are from the mean of the set. • The greater the standard deviation… - the more spread/variation - the farther the random piece of data is from the mean The lower the standard deviation… - the closer the random piece of data is to the mean - the more clustering around the mean - the less variation/spread