Name: ________________________ Class: _________________ AU5: Notes #3 – Box-and-Whisker Date: __________________ Vocabulary: Box-and-Whisker Plot – is a data display that divides a set of data into four parts. The median or second quartile separates the set into two halves: the numbers that are below the median and the numbers that are above the median. The first quartile is the median of the lower half. The third quartile is the median of the upper half. The lower extreme is the least data value and the upper extreme is the greatest data value. 2 3 4 1 5 Five Number Summary 1 = minimum value, lower extreme 2 = 1st quartile, lower quartile (25th percentile) 3 = 2nd quartile, median (50th percentile) 4 = 3rd quartile, upper quartile (75th percentile) 5 = maximum value, upper extreme Interquartile Range is the distance between the first quartile and third quartile. It accounts for the middle 50% of the data. Outliers are values much lower or much higher than most of the data. In a box-andwhisker plot, outliers are data that fall more than 1.5 times the interquartile range from the quartiles. Do not extend whiskers to any outliers. 1 Application: 1. Consider the following scenario. A television game show, “Fact or Fiction”, was canceled after nine shows. Many people watched the nine shows and were rather upset when it was taken off the air. A random sample of eighty viewers of the show was selected. Viewers in the sample responded to several questions. The dot plot below shows the distribution of ages of these eighty viewers: a. Approximately where would you locate the mean (balance point) in the above distribution? b. How does the direction of the tail affect the location of the mean age compared to the median age? c. The mean age of the above sample is approximately 50. Do you think this age describes the typical viewer of this show? Explain your answer. 2 d. Using the dot plot on the previous page, construct a box plot over the dot plot by completing the following steps: i. Locate the middle 40 observations and draw a box around these values. ii. Calculate the median and then draw a line in the box at the location of the median. iii. Draw a line that extends from the upper end of the box to the largest observation in the data set. iv. Draw a line that extends from the lower edge of the box to the minimum value in the data set. e. Recall that the 5 values used to construct the dot plot make up the 5–number summary. What is the 5–number summary for this data set of ages? Minimum Age ________ Lower quartile or Q1 ________ Median Age ________ Upper quartile or Q3 ________ Maximum Age ________ f. What percent of the data falls between the minimum value and Q1? g. What percent of the data does the box part of the box plot capture? h. What is the range of the middle fifty percent? i. The IQR provides a summary of the variability for a skewed data distribution. The IQR is a number that specifies the length of the interval that contains the middle half of the ages of viewers. Do you think producers of the show would prefer a show that has a small or large interquartile range? Explain your answer. 3 Try-It! 2. Thirty female users and twenty-five male users were selected at random from a database of people who play a video game regularly. Each of them agreed to be part of a research study and report their scores. A leadership score is based on a player’s answers to leadership questions. A score of 1 to 40 is considered a beginning level leadership score, a score of 41 to 60 is considered a middle level leadership score, and a score of greater than 60 is considered an advanced level leadership score. a. Use the following data to make a box plot of the female scores, and a box plot of the male scores on the line graph below. Based on your box plots describe the variability of the leadership scores between male and female users; and which users tend to have a higher leadership score. 4 3. Consider a box plot of the ages of 200 randomly selected people from Kenya: A data distribution may contain extreme data. A box plot can be used to display extreme data values that are identified as outliers. The “*” in the box plot are the ages of 4 people from this sample. Based on the sample, these 4 ages were considered outliers. An outlier is defined to be any data value that is more than π.π×(π°πΈπΉ) away from the nearest quartile. a. Estimate the values of the 4 ages represented by an *. b. What is the median age of the sample of ages from Kenya? What are the approximate values of Q1 and Q3? What is the approximate interquartile range (IQR) of this sample? c. Multiply the interquartile range (IQR) by 1.5. What value do you get? d. Are there any age values that are less than πΈπ−π.π×(π°πΈπΉ)? If so, these ages would also be considered outliers. e. Explain why there is no * on the low side of the box plot for ages of the people in the sample from Kenya. 5