Box and Whisker Plots If we have a set of numbers, say –4, 20, 0, –30, 45, –2, 17, 150, –300, 5, 8, 15. 120, 3, –56, then the MEDIAN of this set of numbers is defined to be the middle number, once they are ordered increasing from left to right, as –300, –56, –30, –4, –2, 0, 3, 5, 8, 15, 17, 20, 45, 120, 150, Since there are 15 numbers, the middle one is the eighth one, or 5. The median, as opposed to the ordinary average, or mean, is not sensitive to outliers, i.e., to extremely large or extremely small numbers. In other words, if the 150 were to change to a 1500, the median would still be 5. If we have an even set of numbers –5, 3, 6, 7 then the median is calculated differently, since there is no middle number. In fact, there are TWO middle numbers 3 and 6. The median is defined to be the average, or mean of the two middle numbers (3 + 6)/2 = 4.5. We now want to further divide the above existing ordered set of 15 numbers into four parts. Consider the set of the first 7 of these numbers –300, –56, –30, –4, –2, 0, 3. We can divide this set in half by taking its median, which is –4. Call –4 the first quartile, and denote it by Q1. Now consider the set of the last 7 numbers 8, 15, 17, 20, 45, 120, 150. The median of these last 7 numbers is 20 and we call it the third quartile and denote it by Q3. We also call the median 5 the second quartile Q2. If we add one more number –100 to the above set of 15 numbers, we get the ordered set –300, –100, –56, –30, –4, –2, 0, 3, 5, 8, 15, 17, 20, 45, 120, 150 of 16 numbers. We have to calculate the quartiles slightly differently. First of all, There is no longer any middle number of the 16 numbers, so we have to average the two middle numbers, to get (3+5)/2 = 4, the median of the 16 numbers. Now when we take the first half of these 16 numbers, we have to make sure to include 3, one of the middle two numbers. So we take the first 8 numbers, –300, –100, –56, –30, –4, –2, 0, 3. The first quartile Q1 of these 16 numbers is the median (–30 + –4)/2 = –17 of these first 8 numbers. Similarly, to calculate the third quartile Q3 of these 16 numbers, we take the median (17 + 20)/2 = 18.5 of the second 8 numbers 5, 8, 15, 17, 20, 45, 120, 150. Notice that we have included 5, the second middle number of the 16 numbers, in the second half. One common error is to think that, since when we had only 15 numbers, the middle number was not a part of either the first half or the last half. In the case there are two middle numbers, we have to include the smaller of the two middle numbers in the first half of the 16 numbers and the larger of the two middle numbers in the second half of the 16 numbers. The interquartile range, or IRQ, is Q3 – Q1. In the case of the 16 numbers above, it is 18.5 – (–17) = 35.5. Now we can draw our box, where we have put a vertical line at the median Q2. Now in order to draw the whiskers, we have to define what it means to be an outlier. A number is an outlier if it is either Q3 + (3/2)IRQ = 18.5 + (1.5)35.5 = 71.75 or Q1 – (3/2)IRQ = –17 – (1.5)35.5 = –70.25 Therefore 120 and 150 are outliers on the right, and –300 and –100 are outliers on the left. On the right, we draw a whisker to the largest non-outlier, which is 45. On the left, we draw a whisker to the smallest non-outlier, which is –56. This is shown below. Now we have to distinguish between mild outliers and extreme outliers. An outlier is called an extreme outlier if it is either Q3 + (3)IRQ = 18.5 + (3)35.5 = 125 or Q1 – (3)IRQ = –17 –(3)35.5 = –123.5 If an outlier is not extreme, it is called mild. Thus 120 is a mild outlier on the right, and 150 is an extreme outlier on the right. Similarly, –100 is a mild outlier on the left, and –300 is an extreme outlier on the left. Finally, we symbolize a mild outlier with a solid circle and an extreme outlier is symbolized with an open circle o. The completed box and whisker plot looks like this In order to keep straight which of the two kinds of circles belongs to which kinds of outliers, imagine that we define a really extreme outlier to be one which is either Q3 + (5)IRQ or Q1 – (5)IRQ Then we would symbolize a really extreme outlier with a dotted circle . In other words, it’s disappearing, so it’s barely there, as opposed to a mild outlier, which is more substantial.