Box and Whisker Plots

advertisement
Box and Whisker Plots
If we have a set of numbers, say –4, 20, 0, –30, 45, –2, 17, 150, –300, 5, 8, 15. 120, 3, –56, then
the MEDIAN of this set of numbers is defined to be the middle number, once they are ordered
increasing from left to right, as –300, –56, –30, –4, –2, 0, 3, 5, 8, 15, 17, 20, 45, 120, 150, Since
there are 15 numbers, the middle one is the eighth one, or 5. The median, as opposed to the
ordinary average, or mean, is not sensitive to outliers, i.e., to extremely large or extremely small
numbers. In other words, if the 150 were to change to a 1500, the median would still be 5.
If we have an even set of numbers –5, 3, 6, 7 then the median is calculated differently, since
there is no middle number. In fact, there are TWO middle numbers 3 and 6. The median is
defined to be the average, or mean of the two middle numbers (3 + 6)/2 = 4.5.
We now want to further divide the above existing ordered set of 15 numbers into four parts.
Consider the set of the first 7 of these numbers –300, –56, –30, –4, –2, 0, 3. We can divide this
set in half by taking its median, which is –4. Call –4 the first quartile, and denote it by Q1. Now
consider the set of the last 7 numbers 8, 15, 17, 20, 45, 120, 150. The median of these last 7
numbers is 20 and we call it the third quartile and denote it by Q3. We also call the median 5 the
second quartile Q2.
If we add one more number –100 to the above set of 15 numbers, we get the ordered set
–300, –100, –56, –30, –4, –2, 0, 3, 5, 8, 15, 17, 20, 45, 120, 150 of 16 numbers. We have to
calculate the quartiles slightly differently. First of all, There is no longer any middle number of
the 16 numbers, so we have to average the two middle numbers, to get (3+5)/2 = 4, the median of
the 16 numbers. Now when we take the first half of these 16 numbers, we have to make sure to
include 3, one of the middle two numbers. So we take the first 8 numbers, –300, –100, –56, –30,
–4, –2, 0, 3. The first quartile Q1 of these 16 numbers is the median (–30 + –4)/2 = –17 of these
first 8 numbers. Similarly, to calculate the third quartile Q3 of these 16 numbers, we take the
median (17 + 20)/2 = 18.5 of the second 8 numbers 5, 8, 15, 17, 20, 45, 120, 150. Notice that we
have included 5, the second middle number of the 16 numbers, in the second half. One common
error is to think that, since when we had only 15 numbers, the middle number was not a part of
either the first half or the last half. In the case there are two middle numbers, we have to include
the smaller of the two middle numbers in the first half of the 16 numbers and the larger of the two
middle numbers in the second half of the 16 numbers.
The interquartile range, or IRQ, is Q3 – Q1. In the case of the 16 numbers above, it is 18.5 – (–17)
= 35.5. Now we can draw our box, where we have put a vertical line at the median Q2.
Now in order to draw the whiskers, we have to define what it means to be an outlier. A number is
an outlier if it is either
 Q3 + (3/2)IRQ = 18.5 + (1.5)35.5 = 71.75
or
 Q1 – (3/2)IRQ = –17 – (1.5)35.5 = –70.25
Therefore 120 and 150 are outliers on the right, and –300 and –100 are outliers on the left. On the
right, we draw a whisker to the largest non-outlier, which is 45. On the left, we draw a whisker to
the smallest non-outlier, which is –56. This is shown below.
Now we have to distinguish between mild outliers and extreme outliers. An outlier is called an
extreme outlier if it is either
 Q3 + (3)IRQ = 18.5 + (3)35.5 = 125
or
 Q1 – (3)IRQ = –17 –(3)35.5 = –123.5
If an outlier is not extreme, it is called mild. Thus 120 is a mild outlier on the right, and 150 is an
extreme outlier on the right. Similarly, –100 is a mild outlier on the left, and –300 is an extreme
outlier on the left. Finally, we symbolize a mild outlier with a solid circle
 and an extreme
outlier is symbolized with an open circle o. The completed box and whisker plot looks like this
In order to keep straight which of the two kinds of circles belongs to which kinds of outliers,
imagine that we define a really extreme outlier to be one which is either
 Q3 + (5)IRQ
or
 Q1 – (5)IRQ
Then we would symbolize a really extreme outlier with a dotted circle
. In other words, it’s
disappearing, so it’s barely there, as opposed to a mild outlier, which is more substantial.
Download