Chapter One Review Exercises Data Set A: Data Set B: Data Set C: Data Set D: 10, 7, 18, 13, 12, 17, 11, 14, 16, 22 8, 13, 16, 10, 11, 7, 8, 11, 9, 17, 22 7, 9, 19, 8, 20, 17, 19, 9, 22, 18, 10, 10 17, 17, 7, 18, 8, 18, 9, 10, 22, 19, 22, 21, 20 1. a) For each of the data sets A-D find (i) the mean and (ii) the median. (i) The mean is found by summing the values and dividing by the number of data points. So: A: mean=140/10=14 B: mean=132/11=12 C: mean=168/12=14 D: mean=208/13=16 (ii) The median requires that the data be ordered: A: B: C: D: 7,10,11,12,13,14,16,17,18,22 7,8,8,9,10,11,11,13,16,17,22 7,8,9,9,10,10,17,18,19,19,20,22 7,8,9,10,17,17,18,18,19,20,21,22,22 The median is the middle number in the list, or halfway between the two middle numbers if the list has an even number of data values. So: A: median=13.5 B: median=11 C: median=13.5 D: median=18 b) Which of these averages better describes the data set? In all four cases there is very little to choose. The data lies between 7 and 22 and the mean and median values are quite close. 2. a) For each of the data sets A-D find the lower and upper quartiles. The quartiles are the medians of the values above/below the median position in the ordered sets. A: B: C: D: 7,10,11,12,13,|,14,16,17,18,22 7,8,8,9,10,11,11,13,16,17,22 7,8,9,9,10,10,|,17,18,19,19,20,22 7,8,9,10,17,17,18,18,19,20,21,22,22 We therefore have A: LQ=11, UQ=17 B: LQ=8, UQ=16 C: LQ=9, UQ=19 D: LQ=9.5, UQ=20.5 b) Draw boxplots for the four data sets on the same number line. Use the boxplots to determine which sets will have the larger standard deviations and to classify the sets as unimodal symmetric, bimodal symmetric, left-skewed or right skewed. C will have a larger standard deviation than A – they have the same mean and same high and low values but A’s rectangle (middle 50% of data) is shorter than C’s. B has the same sized rectangle as A but some values are much further from the mean than in A. B will have larger standard deviation than A Likewise C will have a larger standard deviation than D. Remember that the mean, the value used for standard deviation is not shown on the boxplot. The center line is the median! Conclusion: A smaller than B smaller than D smaller than C. Boxplots A and C look symmetric; B looks skewed-to-the-right and D appears to be skewed-to-the-left. This is based on looking at both the rectangle and the dot-to-dot pieces in relation to the center line. 3. a) Draw 4-bar histograms for each data set A-D using the bins 6-10, 11-15, 1620 and 21-25. Data Set A Data Set B 5 Frequency Frequency 6 5 4 3 2 1 0 6-10 11-15 16-20 Values 21-25 4 3 2 1 0 6-10 11-15 16-20 Values 21-25 Data Set C Data Set D 7 6 5 Frequency Frequency 7 4 3 2 1 0 6 5 4 3 2 1 0 6-10 11-15 16-20 21-25 6-10 Values 11-15 16-20 21-25 Values b) Draw 4-bar histograms for each data set A-D using 4 equal-sized bins which span only the numbers 7-22. [These are better than the bins in part a) which include numbers such as 6 and 25 which lie outside the range of the data.] Data Set B 5 6 4 5 Frequency Frequency Data Set A 3 2 1 4 3 2 1 0 0 7-10 11-14 15-18 19-22 7-10 Values 15-18 19-22 Values Data Set D Data Set C 7 6 6 5 5 Frequency Frequency 11-14 4 3 2 1 4 3 2 1 0 0 7-10 11-14 15-18 Values 19-22 7-10 11-14 15-18 19-22 Values c) Comment on whether the better choice in part b) actually gives a histogram that says something different than the simpler 5-and-10 bin ranges of part a. The histograms for Data Sets A and B are identical; there is little qualitative change in the histograms for Data Sets C and D. The extra effort expended to get ‘perfect’ bins with no overlap at the ends, isn’t really worth it. d) Use the histograms to classify the sets as unimodal symmetric, bimodal symmetric, left-skewed or right skewed. Is this consistent with the answer to 2b? Data Set A looks symmetric, Data Set B looks skewed-to-the right and Data Sets C and D look bimodal (because of the empty spot in the histogram.) These conclusions for Data Sets a and B are the same as those from the boxplots. But the bimodality seen in the histogram is something that is not apparent from the boxplot. The boxplot merely says that approximately 25% of the data values lie in each fourth – it does not indicate if these are well spread within a region or all at one end. This is seen in the histogram. A closer inspection of the histogram shows that in C there are an equal amount of high and low data values – so possibly bimodally symmetric. In D there are twice as many high values as low values, but the description ‘skewed-to-the-left’ is a bit of a stretch! 4. a) For each of the data sets A-D compute the standard deviation. Recall that the means were found in #1 a) i) These are needed in the calculations. Data Set A Data Value 7 10 11 12 13 14 16 17 18 22 Difference from Mean (14) 7 4 3 2 1 0 2 3 4 8 Squared Difference 49 16 9 4 1 0 4 9 16 64 SUM = 172 Dividing by one less than the number of data points gives 172/9=19.11. Taking the square root gives a standard deviation of 4.371625 Data Set B Data Value 7 8 8 9 10 11 11 13 16 17 22 Difference from Mean (12) 5 4 4 3 2 1 1 1 4 5 10 Squared Difference 25 16 16 9 4 1 1 1 16 25 100 SUM = 214 Dividing by one less than the number of data points gives 214/10=21.40. Taking the square root gives a standard deviation of 4.626013 Data Set C Data Value 7 8 9 9 10 10 17 18 19 19 20 22 Difference from Mean (14) 7 6 5 5 4 4 3 4 5 5 6 8 Squared Difference 49 36 25 25 16 16 9 16 25 25 36 64 SUM = 342 Dividing by one less than the number of data points gives 342/11=31.09. Taking the square root gives a standard deviation of 5.575922 Data Set D Data Value 7 8 9 10 17 17 18 18 19 20 21 22 22 Difference from Mean (16) 9 8 7 6 1 1 2 2 3 4 5 6 6 Squared Difference 81 64 49 36 1 1 4 4 9 16 25 36 36 SUM = 362 Dividing by one less than the number of data points gives 362/12=30.17. Taking the square root gives a standard deviation of 5.492419 SUMMARY A: Standard Deviation B: Standard Deviation C: Standard Deviation D: Standard Deviation = 4.37 = 4.63 = 5.58 = 5.49 b) Is this consistent with your answer to 2b? Yes. The standard deviations of A and B are much smaller than those of C and D. This is expected from the sizes of the rectangles on the boxplot. B beats out A and C beats out D for the reasons given in 2b). 5. Consider boxplots A, B and C as given along the same number line below: Which of the following are true? a) All the data values for A are greater than the median data value for B. True. The Low value for A appears to be the same as the Median value of B b) One quarter of the data values for C are less than the median value for B. True. The LQ for C appears to be the same as the Median value of B c) A has a greater median than C. False. The Median of C appears to be the same as the UQ for A. d) Boxplot B is symmetric. True. Folding the boxplot along the Median line gives a match on each side. e) Boxplot C is right skewed. False. The boxplot indicates more on the left of the median and so is leftskewed. f) In A, the mean is greater than the median. False. Since the boxplot is symmetric, the mean and median are likely to be very close. g) In C, the mean is less than the median True. Since the boxplot is left-skewed, the mean is likely to be less than the median. The excess values to the left of the median are not balanced by values above the median and will therefore pull the mean to a value below the median.