Using Measures of Center and Measures of Variability Objective In this lesson, you will Comparing Data Sets Using Measures of Center Any statistical data can be organized and summarized using measures of center. • mean = the average middle of the data set → ____________ the sum of all the values in the data set by the total number of values. • median = the average middle of the data set → ___________ the values from least to greatest. The median divides the set into two halves: one set of values lower than the median and one set higher than the median. The table shows the number of words Luis and Darcy each wrote for 10 random essays. mean = sum of values in data set number of values in data set → The mean number of words in Luis's essays is 2,108 → The mean number of words in Darcy's essays is 10 = _________. 2,415 10 = _________. There are an even number of values in each data set. So, the median is the mean of the two middle values. Luis 178 213 198 245 236 198 221 253 189 177 Darcy 231 210 245 259 286 245 231 244 236 228 → When listed in order, the fifth and sixth values of Luis's set are ______ and ______, respectively. So, the median number of words in Luis's essays is 411 2 = _________. → When listed in order, the fifth and sixth values of Darcy's set are ______ and ______, respectively. So, the median number of words in Darcy's essays is 411 2 = _________. Both the mean and the median number of words are higher in Darcy’s essays than in Luis’s essays, I can conclude that ______________ essays generally contain more words than _____________ essays. If one data set has a higher mean or median than another data set, that suggests that the values in the data set tend to be higher lower than the values in the other data set. Which measure of center to use: • in many situations, the best measure of center is the mean • if there are ______________ (values that are significantly higher or lower than most of the data set), then median is the measure that best summarizes the data • if the data is in a box plot , then _______________ is the only measure of center known Example 1: Example 2: The table shows the numbers of movies rented from two movie rental stores on 10 random days. The dot plots show the number of dogs groomed at two dog-grooming salons on nine random days. Store 1 18 14 20 22 16 15 21 20 13 15 Store 2 35 24 28 32 27 29 34 25 33 26 There are no outliers in either data set. In the data set for salon 1, the value ____ is an outlier. In the data set for salon 2, the values ____ So, the best measure of center is the and ____ are outliers. _____________. So, use the ___________ to compare the data sets. Store ____ generally rents more movies than Salon 1 generally grooms store _____. dogs per day than salon 2. fewer more Luther and Nelson are seniors on the school basketball team. The box plots represent the number of points they scored in 20 randomly selected games. From the box plots, we get these values: Nelson Luther minimum value 0 3 first quartile 6 8 median 14 10 third quartile 18 15 So, we can infer that ______________ generally scores more maximum value 24 18 points in a game than ______________ does. Comparing Data Sets Using Measures of Variation Measures of variation → describe the _________________ of the values → tell how much the values in the data set _________ interquartile range: → a mean median based measure → gives the spread of the middle _______% of the data → the difference of the _________ (third) and lower(first) quartiles: ❖ IQR = Q3 – Q1 mean absolute deviation: → a mean median based measure → represents the average distance of each value in the data set from the _________ Astrid and Warren often play dominoes with their parents. The table shows their scores for 15 picked randomly games. Astrid Warren 15 0 20 25 25 15 15 10 0 20 15 30 30 10 40 15 35 5 20 25 15 20 25 15 30 20 10 5 5 10 To find the interquartile ranges, arrange each set of scores in ascending order. Astrid: 0, 5, 10, 15, 15, 15, 15, 20, 20, 25, 25, 30, 30, 35, 40 Warren: 0, 5, 5, 10, 10, 10, 15, 15, 15, 20, 20, 20, 25, 25, 30 The first quartile is the ____th value, and the third quartile is the ____th value. interquartile range = third quartile – first quartile Astrid: Warren: interquartile range = _____ – _____ interquartile range = _____ – _____ = _____ = _____ To find the mean absolute deviation of the scores: mean absolute deviation = sum of absoulte deviations Find the absolute deviation of each value by __________________ Find the mean of the data set. the value from the mean, and taking the absolute value. Divide the sum of those absolute deviations by the number of values in the data set. Astrid Warren 300 15 225 15 = _____ mean absolute deviation = = _____ mean absolute deviation = The interquartile range of Astrid's data set is greater 15 15 ≈ 8. 6̅ ≈ 6. 6̅ less than that of Warren's data set. The mean absolute deviation of Astrid's data set is higher These results indicate that Astrid’s data is more spread out than Warren’s data. less . number of values in data set lower than that of Warren's data set. Which measure of variation to use: • In many cases, absolute deviation is the best to use → A data set with a higher → If the mean absolute deviation is lower mean absolute deviation has a greater spread. higher lower, the values are more clustered around the mean, indicating that the spread is not as great. • If there are outliers, interquartile range is the best measure of variation to use. → The higher the interquartile range, the __________ spread out the data. • If the data is in a box plot , then the interquartile range mean absolute deviation is the only measure of variation that can be used. Summary What is a possible explanation for why median and interquartile range are the preferred measures of center and variation when a data set contains outliers?